[slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory

Mon Jan 31 09:04:10 UTC 2022

Hi Nousheen,

I recommend you again to follow the steps for installing Slurm on a CentOS 
7 cluster:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

Maybe you will need to start installation from scratch, but the steps are 
guaranteed to work if followed correctly.

IHTH,
Ole

On 1/31/22 06:23, Nousheen wrote:
> The same error shows up on compute node which is as follows:
> 
> [root at c103008 ~]# systemctl enable slurmd.service
> [root at c103008 ~]# systemctl start slurmd.service
> [root at c103008 ~]# systemctl status slurmd.service
> ● slurmd.service - Slurm node daemon
>     Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor 
> preset: disabled)
>     Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 EST; 
> 2s ago
>    Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS 
> (code=exited, status=203/EXEC)
>   Main PID: 11505 (code=exited, status=203/EXEC)
> 
> Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited, 
> code=exited, status=203/EXEC
> Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed state.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
> 
> 
> Best Regards,
> Nousheen Parvaiz
> 
> 
> ᐧ
> 
> On Mon, Jan 31, 2022 at 10:08 AM Nousheen <nousheenparvaiz at gmail.com 
> <mailto:nousheenparvaiz at gmail.com>> wrote:
> 
>     Dear Jeffrey,
> 
>     Thank you for your response. I have followed the steps as instructed.
>     After the copying the files to their respective locations "systemctl
>     status slurmctld.service" command gives me an error as follows:
> 
>     (base) [nousheen at exxact system]$ systemctl daemon-reload
>     (base) [nousheen at exxact system]$ systemctl enable slurmctld.service
>     (base) [nousheen at exxact system]$ systemctl start slurmctld.service
>     (base) [nousheen at exxact system]$ systemctl status slurmctld.service
>     ● slurmctld.service - Slurm controller daemon
>         Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
>     vendor preset: disabled)
>         Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
>     PKT; 3s ago
>        Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
>     $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
>       Main PID: 18114 (code=exited, status=1/FAILURE)
> 
>     Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
>     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
>     exited, code=exited, status=1/FAILURE
>     Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered
>     failed state.
>     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
> 
>     Kindly guide me. Thank you so much for your time.
> 
>     Best Regards,
>     Nousheen Parvaiz
> 
>     ᐧ
> 
>     On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRLang at uwyo.edu
>     <mailto:JRLang at uwyo.edu>> wrote:
> 
>         The missing file error has nothing to do with slurm.  The
>         systemctl command is part of the systems service management.____
> 
>         __ __
> 
>         The error message indicates that you haven’t copied the
>         slurmd.service file on your compute node to /etc/systemd/system or
>         /usr/lib/systemd/system.  /etc/systemd/system is usually used when
>         a user adds a new service to a machine.____
> 
>         __ __
> 
>         Depending on your version of Linux you may also need to do a
>         systemctl daemon-reload to activate the slurmd.service within
>         system.____
> 
>         __ __
> 
>         Once slurmd.service is copied over, the systemctld command should
>         work just fine.____
> 
>         __ __
> 
>         Remember:____
> 
>                          slurmd.service     -  Only on compute nodes____
> 
>                          slurmctld.service – Only on your cluster
>         management node____
> 
>                        slurmdbd.service – Only on your cluster management
>         node____
> 
>         __ __
> 
>         *From:* slurm-users <slurm-users-bounces at lists.schedmd.com
>         <mailto:slurm-users-bounces at lists.schedmd.com>> *On Behalf Of
>         *Nousheen
>         *Sent:* Thursday, January 27, 2022 3:54 AM
>         *To:* Slurm User Community List <slurm-users at lists.schedmd.com
>         <mailto:slurm-users at lists.schedmd.com>>
>         *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
>         execute operation: No such file or directory____
> 
>         __ __
> 
>         ◆ This message was sent from a non-UWYO address. Please exercise
>         caution when clicking links or opening attachments from external
>         sources.____
> 
>         __ __
> 
>         __ __
> 
>         Hello everyone,____
> 
>         __ __
> 
>         I am installing slurm on Centos 7 following tutorial:
>         https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>         <https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____
> 
>         __ __
> 
>         I am at the step where we start slurm but it gives me the
>         following error:____
> 
>         __ __
> 
>         [root at exxact slurm-21.08.5]# systemctl enable slurmd.service____
> 
>         Failed to execute operation: No such file or directory____
> 
>         __ __
> 
>         I have run the command to check if slurm is configured properly____
> 
>         __ __
> 
>         [root at exxact slurm-21.08.5]# slurmd -C
>         NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
>         CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
>         UpTime=19-16:06:00____
> 
>         __ __
> 
>         I am new to this and unable to understand the problem. Kindly help
>         me resolve this.____
> 
>         __ __
> 
>         My slurm.conf file is as follows:____
> 
>         __ __
> 
>         # slurm.conf file generated by configurator.html.
>         # Put this file on all nodes of your cluster.
>         # See the slurm.conf man page for more information.
>         #
>         ClusterName=cluster194
>         SlurmctldHost=192.168.60.194
>         #SlurmctldHost=
>         #
>         #DisableRootJobs=NO
>         #EnforcePartLimits=NO
>         #Epilog=
>         #EpilogSlurmctld=
>         #FirstJobId=1
>         #MaxJobId=67043328
>         #GresTypes=
>         #GroupUpdateForce=0
>         #GroupUpdateTime=600
>         #JobFileAppend=0
>         #JobRequeue=1
>         #JobSubmitPlugins=lua
>         #KillOnBadExit=0
>         #LaunchType=launch/slurm
>         #Licenses=foo*4,bar
>         #MailProg=/bin/mail
>         #MaxJobCount=10000
>         #MaxStepCount=40000
>         #MaxTasksPerNode=512
>         MpiDefault=none
>         #MpiParams=ports=#-#
>         #PluginDir=
>         #PlugStackConfig=
>         #PrivateData=jobs
>         ProctrackType=proctrack/cgroup
>         #Prolog=
>         #PrologFlags=
>         #PrologSlurmctld=
>         #PropagatePrioProcess=0
>         #PropagateResourceLimits=
>         #PropagateResourceLimitsExcept=
>         #RebootProgram=
>         ReturnToService=1
>         SlurmctldPidFile=/var/run/slurmctld.pid
>         SlurmctldPort=6817
>         SlurmdPidFile=/var/run/slurmd.pid
>         SlurmdPort=6818
>         SlurmdSpoolDir=/var/spool/slurmd
>         SlurmUser=nousheen
>         #SlurmdUser=root
>         #SrunEpilog=
>         #SrunProlog=
>         StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
>         SwitchType=switch/none
>         #TaskEpilog=
>         TaskPlugin=task/affinity
>         #TaskProlog=
>         #TopologyPlugin=topology/tree
>         #TmpFS=/tmp
>         #TrackWCKey=no
>         #TreeWidth=
>         #UnkillableStepProgram=
>         #UsePAM=0
>         #
>         #
>         # TIMERS
>         #BatchStartTimeout=10
>         #CompleteWait=0
>         #EpilogMsgTime=2000
>         #GetEnvTimeout=2
>         #HealthCheckInterval=0
>         #HealthCheckProgram=
>         InactiveLimit=0
>         KillWait=30
>         #MessageTimeout=10
>         #ResvOverRun=0
>         MinJobAge=300
>         #OverTimeLimit=0
>         SlurmctldTimeout=120
>         SlurmdTimeout=300
>         #UnkillableStepTimeout=60
>         #VSizeFactor=0
>         Waittime=0
>         #
>         #
>         # SCHEDULING
>         #DefMemPerCPU=0
>         #MaxMemPerCPU=0
>         #SchedulerTimeSlice=30
>         SchedulerType=sched/backfill
>         SelectType=select/cons_tres
>         SelectTypeParameters=CR_Core
>         #
>         #
>         # JOB PRIORITY
>         #PriorityFlags=
>         #PriorityType=priority/basic
>         #PriorityDecayHalfLife=
>         #PriorityCalcPeriod=
>         #PriorityFavorSmall=
>         #PriorityMaxAge=
>         #PriorityUsageResetPeriod=
>         #PriorityWeightAge=
>         #PriorityWeightFairshare=
>         #PriorityWeightJobSize=
>         #PriorityWeightPartition=
>         #PriorityWeightQOS=
>         #
>         #
>         # LOGGING AND ACCOUNTING
>         #AccountingStorageEnforce=0
>         #AccountingStorageHost=
>         #AccountingStoragePass=
>         #AccountingStoragePort=
>         AccountingStorageType=accounting_storage/none
>         #AccountingStorageUser=
>         #AccountingStoreFlags=
>         #JobCompHost=
>         #JobCompLoc=
>         #JobCompPass=
>         #JobCompPort=
>         JobCompType=jobcomp/none
>         #JobCompUser=
>         #JobContainerType=job_container/none
>         JobAcctGatherFrequency=30
>         JobAcctGatherType=jobacct_gather/none
>         SlurmctldDebug=info
>         SlurmctldLogFile=/var/log/slurmctld.log
>         SlurmdDebug=info
>         SlurmdLogFile=/var/log/slurmd.log
>         #SlurmSchedLogFile=
>         #SlurmSchedLogLevel=
>         #DebugFlags=
>         #
>         #
>         # POWER SAVE SUPPORT FOR IDLE NODES (optional)
>         #SuspendProgram=
>         #ResumeProgram=
>         #SuspendTimeout=
>         #ResumeTimeout=
>         #ResumeRate=
>         #SuspendExcNodes=
>         #SuspendExcParts=
>         #SuspendRate=
>         #SuspendTime=
>         #
>         #
>         # COMPUTE NODES
>         NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
> 
>         PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE
>         State=UP ____
> 
>         __ __
> 
> 
>         ____
> 
>         Best Regards,____
> 
>         Nousheen Parvaiz____
> 
>         ᐧ____