[slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory

Nousheen nousheenparvaiz at gmail.com
Tue Feb 1 04:06:13 UTC 2022


Dear Ole,

Thank you for your response.
I am doing it again using your suggested link.

Best Regards,
Nousheen Parvaiz


ᐧ

On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
wrote:

> Hi Nousheen,
>
> I recommend you again to follow the steps for installing Slurm on a CentOS
> 7 cluster:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
>
> Maybe you will need to start installation from scratch, but the steps are
> guaranteed to work if followed correctly.
>
> IHTH,
> Ole
>
> On 1/31/22 06:23, Nousheen wrote:
> > The same error shows up on compute node which is as follows:
> >
> > [root at c103008 ~]# systemctl enable slurmd.service
> > [root at c103008 ~]# systemctl start slurmd.service
> > [root at c103008 ~]# systemctl status slurmd.service
> > ● slurmd.service - Slurm node daemon
> >     Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor
> > preset: disabled)
> >     Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42
> EST;
> > 2s ago
> >    Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS
> > (code=exited, status=203/EXEC)
> >   Main PID: 11505 (code=exited, status=203/EXEC)
> >
> > Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited,
> > code=exited, status=203/EXEC
> > Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed
> state.
> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
> >
> >
> > Best Regards,
> > Nousheen Parvaiz
> >
> >
> > ᐧ
> >
> > On Mon, Jan 31, 2022 at 10:08 AM Nousheen <nousheenparvaiz at gmail.com
> > <mailto:nousheenparvaiz at gmail.com>> wrote:
> >
> >     Dear Jeffrey,
> >
> >     Thank you for your response. I have followed the steps as instructed.
> >     After the copying the files to their respective locations "systemctl
> >     status slurmctld.service" command gives me an error as follows:
> >
> >     (base) [nousheen at exxact system]$ systemctl daemon-reload
> >     (base) [nousheen at exxact system]$ systemctl enable slurmctld.service
> >     (base) [nousheen at exxact system]$ systemctl start slurmctld.service
> >     (base) [nousheen at exxact system]$ systemctl status slurmctld.service
> >     ● slurmctld.service - Slurm controller daemon
> >         Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
> >     vendor preset: disabled)
> >         Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
> >     PKT; 3s ago
> >        Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
> >     $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
> >       Main PID: 18114 (code=exited, status=1/FAILURE)
> >
> >     Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
> >     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
> >     exited, code=exited, status=1/FAILURE
> >     Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered
> >     failed state.
> >     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
> >
> >     Kindly guide me. Thank you so much for your time.
> >
> >     Best Regards,
> >     Nousheen Parvaiz
> >
> >     ᐧ
> >
> >     On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRLang at uwyo.edu
> >     <mailto:JRLang at uwyo.edu>> wrote:
> >
> >         The missing file error has nothing to do with slurm.  The
> >         systemctl command is part of the systems service management.____
> >
> >         __ __
> >
> >         The error message indicates that you haven’t copied the
> >         slurmd.service file on your compute node to /etc/systemd/system
> or
> >         /usr/lib/systemd/system.  /etc/systemd/system is usually used
> when
> >         a user adds a new service to a machine.____
> >
> >         __ __
> >
> >         Depending on your version of Linux you may also need to do a
> >         systemctl daemon-reload to activate the slurmd.service within
> >         system.____
> >
> >         __ __
> >
> >         Once slurmd.service is copied over, the systemctld command should
> >         work just fine.____
> >
> >         __ __
> >
> >         Remember:____
> >
> >                          slurmd.service     -  Only on compute nodes____
> >
> >                          slurmctld.service – Only on your cluster
> >         management node____
> >
> >                        slurmdbd.service – Only on your cluster management
> >         node____
> >
> >         __ __
> >
> >         *From:* slurm-users <slurm-users-bounces at lists.schedmd.com
> >         <mailto:slurm-users-bounces at lists.schedmd.com>> *On Behalf Of
> >         *Nousheen
> >         *Sent:* Thursday, January 27, 2022 3:54 AM
> >         *To:* Slurm User Community List <slurm-users at lists.schedmd.com
> >         <mailto:slurm-users at lists.schedmd.com>>
> >         *Subject:* [slurm-users] systemctl enable slurmd.service Failed
> to
> >         execute operation: No such file or directory____
> >
> >         __ __
> >
> >         ◆ This message was sent from a non-UWYO address. Please exercise
> >         caution when clicking links or opening attachments from external
> >         sources.____
> >
> >         __ __
> >
> >         __ __
> >
> >         Hello everyone,____
> >
> >         __ __
> >
> >         I am installing slurm on Centos 7 following tutorial:
> >
> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
> >         <
> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
> >____
> >
> >         __ __
> >
> >         I am at the step where we start slurm but it gives me the
> >         following error:____
> >
> >         __ __
> >
> >         [root at exxact slurm-21.08.5]# systemctl enable slurmd.service____
> >
> >         Failed to execute operation: No such file or directory____
> >
> >         __ __
> >
> >         I have run the command to check if slurm is configured
> properly____
> >
> >         __ __
> >
> >         [root at exxact slurm-21.08.5]# slurmd -C
> >         NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
> >         CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
> >         UpTime=19-16:06:00____
> >
> >         __ __
> >
> >         I am new to this and unable to understand the problem. Kindly
> help
> >         me resolve this.____
> >
> >         __ __
> >
> >         My slurm.conf file is as follows:____
> >
> >         __ __
> >
> >         # slurm.conf file generated by configurator.html.
> >         # Put this file on all nodes of your cluster.
> >         # See the slurm.conf man page for more information.
> >         #
> >         ClusterName=cluster194
> >         SlurmctldHost=192.168.60.194
> >         #SlurmctldHost=
> >         #
> >         #DisableRootJobs=NO
> >         #EnforcePartLimits=NO
> >         #Epilog=
> >         #EpilogSlurmctld=
> >         #FirstJobId=1
> >         #MaxJobId=67043328
> >         #GresTypes=
> >         #GroupUpdateForce=0
> >         #GroupUpdateTime=600
> >         #JobFileAppend=0
> >         #JobRequeue=1
> >         #JobSubmitPlugins=lua
> >         #KillOnBadExit=0
> >         #LaunchType=launch/slurm
> >         #Licenses=foo*4,bar
> >         #MailProg=/bin/mail
> >         #MaxJobCount=10000
> >         #MaxStepCount=40000
> >         #MaxTasksPerNode=512
> >         MpiDefault=none
> >         #MpiParams=ports=#-#
> >         #PluginDir=
> >         #PlugStackConfig=
> >         #PrivateData=jobs
> >         ProctrackType=proctrack/cgroup
> >         #Prolog=
> >         #PrologFlags=
> >         #PrologSlurmctld=
> >         #PropagatePrioProcess=0
> >         #PropagateResourceLimits=
> >         #PropagateResourceLimitsExcept=
> >         #RebootProgram=
> >         ReturnToService=1
> >         SlurmctldPidFile=/var/run/slurmctld.pid
> >         SlurmctldPort=6817
> >         SlurmdPidFile=/var/run/slurmd.pid
> >         SlurmdPort=6818
> >         SlurmdSpoolDir=/var/spool/slurmd
> >         SlurmUser=nousheen
> >         #SlurmdUser=root
> >         #SrunEpilog=
> >         #SrunProlog=
> >
>  StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
> >         SwitchType=switch/none
> >         #TaskEpilog=
> >         TaskPlugin=task/affinity
> >         #TaskProlog=
> >         #TopologyPlugin=topology/tree
> >         #TmpFS=/tmp
> >         #TrackWCKey=no
> >         #TreeWidth=
> >         #UnkillableStepProgram=
> >         #UsePAM=0
> >         #
> >         #
> >         # TIMERS
> >         #BatchStartTimeout=10
> >         #CompleteWait=0
> >         #EpilogMsgTime=2000
> >         #GetEnvTimeout=2
> >         #HealthCheckInterval=0
> >         #HealthCheckProgram=
> >         InactiveLimit=0
> >         KillWait=30
> >         #MessageTimeout=10
> >         #ResvOverRun=0
> >         MinJobAge=300
> >         #OverTimeLimit=0
> >         SlurmctldTimeout=120
> >         SlurmdTimeout=300
> >         #UnkillableStepTimeout=60
> >         #VSizeFactor=0
> >         Waittime=0
> >         #
> >         #
> >         # SCHEDULING
> >         #DefMemPerCPU=0
> >         #MaxMemPerCPU=0
> >         #SchedulerTimeSlice=30
> >         SchedulerType=sched/backfill
> >         SelectType=select/cons_tres
> >         SelectTypeParameters=CR_Core
> >         #
> >         #
> >         # JOB PRIORITY
> >         #PriorityFlags=
> >         #PriorityType=priority/basic
> >         #PriorityDecayHalfLife=
> >         #PriorityCalcPeriod=
> >         #PriorityFavorSmall=
> >         #PriorityMaxAge=
> >         #PriorityUsageResetPeriod=
> >         #PriorityWeightAge=
> >         #PriorityWeightFairshare=
> >         #PriorityWeightJobSize=
> >         #PriorityWeightPartition=
> >         #PriorityWeightQOS=
> >         #
> >         #
> >         # LOGGING AND ACCOUNTING
> >         #AccountingStorageEnforce=0
> >         #AccountingStorageHost=
> >         #AccountingStoragePass=
> >         #AccountingStoragePort=
> >         AccountingStorageType=accounting_storage/none
> >         #AccountingStorageUser=
> >         #AccountingStoreFlags=
> >         #JobCompHost=
> >         #JobCompLoc=
> >         #JobCompPass=
> >         #JobCompPort=
> >         JobCompType=jobcomp/none
> >         #JobCompUser=
> >         #JobContainerType=job_container/none
> >         JobAcctGatherFrequency=30
> >         JobAcctGatherType=jobacct_gather/none
> >         SlurmctldDebug=info
> >         SlurmctldLogFile=/var/log/slurmctld.log
> >         SlurmdDebug=info
> >         SlurmdLogFile=/var/log/slurmd.log
> >         #SlurmSchedLogFile=
> >         #SlurmSchedLogLevel=
> >         #DebugFlags=
> >         #
> >         #
> >         # POWER SAVE SUPPORT FOR IDLE NODES (optional)
> >         #SuspendProgram=
> >         #ResumeProgram=
> >         #SuspendTimeout=
> >         #ResumeTimeout=
> >         #ResumeRate=
> >         #SuspendExcNodes=
> >         #SuspendExcParts=
> >         #SuspendRate=
> >         #SuspendTime=
> >         #
> >         #
> >         # COMPUTE NODES
> >         NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
> >
> >         PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE
> >         State=UP ____
> >
> >         __ __
> >
> >
> >         ____
> >
> >         Best Regards,____
> >
> >         Nousheen Parvaiz____
> >
> >         ᐧ____
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220201/4bd24603/attachment-0001.htm>


More information about the slurm-users mailing list