[slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory

Hermann Schwärzler hermann.schwaerzler at uibk.ac.at
Mon Jan 31 08:46:25 UTC 2022


Dear Nousheen,

I guess there is something missing in your installation - proably your 
slurm.conf?

Do you have logging enabled for slurmctld? If yes what do you see in 
that log?
Or what do you get if you run slurmctld manually like this:

/usr/local/sbin/slurmctld -D

Regards,
Hermann

On 1/31/22 6:08 AM, Nousheen wrote:
> Dear Jeffrey,
> 
> Thank you for your response. I have followed the steps as instructed. 
> After the copying the files to their respective locations "systemctl 
> status slurmctld.service" command gives me an error as follows:
> 
> (base) [nousheen at exxact system]$ systemctl daemon-reload
> (base) [nousheen at exxact system]$ systemctl enable slurmctld.service
> (base) [nousheen at exxact system]$ systemctl start slurmctld.service
> (base) [nousheen at exxact system]$ systemctl status slurmctld.service
> ● slurmctld.service - Slurm controller daemon
>     Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; 
> vendor preset: disabled)
>     Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31 
> PKT; 3s ago
>    Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s 
> $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
>   Main PID: 18114 (code=exited, status=1/FAILURE)
> 
> Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
> Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process 
> exited, code=exited, status=1/FAILURE
> Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered failed 
> state.
> Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
> 
> Kindly guide me. Thank you so much for your time.
> 
> Best Regards,
> Nousheen Parvaiz
> 
>> 
> On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRLang at uwyo.edu 
> <mailto:JRLang at uwyo.edu>> wrote:
> 
>     The missing file error has nothing to do with slurm.  The systemctl
>     command is part of the systems service management.____
> 
>     __ __
> 
>     The error message indicates that you haven’t copied the
>     slurmd.service file on your compute node to /etc/systemd/system or
>     /usr/lib/systemd/system.  /etc/systemd/system is usually used when a
>     user adds a new service to a machine.____
> 
>     __ __
> 
>     Depending on your version of Linux you may also need to do a
>     systemctl daemon-reload to activate the slurmd.service within
>     system.____
> 
>     __ __
> 
>     Once slurmd.service is copied over, the systemctld command should
>     work just fine.____
> 
>     __ __
> 
>     Remember:____
> 
>                      slurmd.service     -  Only on compute nodes____
> 
>                      slurmctld.service – Only on your cluster management
>     node____
> 
>                    slurmdbd.service – Only on your cluster management
>     node____
> 
>     __ __
> 
>     *From:* slurm-users <slurm-users-bounces at lists.schedmd.com
>     <mailto:slurm-users-bounces at lists.schedmd.com>> *On Behalf Of *Nousheen
>     *Sent:* Thursday, January 27, 2022 3:54 AM
>     *To:* Slurm User Community List <slurm-users at lists.schedmd.com
>     <mailto:slurm-users at lists.schedmd.com>>
>     *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
>     execute operation: No such file or directory____
> 
>     __ __
> 
>     ◆ This message was sent from a non-UWYO address. Please exercise
>     caution when clicking links or opening attachments from external
>     sources.____
> 
>     __ __
> 
>     __ __
> 
>     Hello everyone,____
> 
>     __ __
> 
>     I am installing slurm on Centos 7 following tutorial:
>     https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>     <https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____
> 
>     __ __
> 
>     I am at the step where we start slurm but it gives me the following
>     error:____
> 
>     __ __
> 
>     [root at exxact slurm-21.08.5]# systemctl enable slurmd.service____
> 
>     Failed to execute operation: No such file or directory____
> 
>     __ __
> 
>     I have run the command to check if slurm is configured properly____
> 
>     __ __
> 
>     [root at exxact slurm-21.08.5]# slurmd -C
>     NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6
>     ThreadsPerCore=2 RealMemory=31889
>     UpTime=19-16:06:00____
> 
>     __ __
> 
>     I am new to this and unable to understand the problem. Kindly help
>     me resolve this.____
> 
>     __ __
> 
>     My slurm.conf file is as follows:____
> 
>     __ __
> 
>     # slurm.conf file generated by configurator.html.
>     # Put this file on all nodes of your cluster.
>     # See the slurm.conf man page for more information.
>     #
>     ClusterName=cluster194
>     SlurmctldHost=192.168.60.194
>     #SlurmctldHost=
>     #
>     #DisableRootJobs=NO
>     #EnforcePartLimits=NO
>     #Epilog=
>     #EpilogSlurmctld=
>     #FirstJobId=1
>     #MaxJobId=67043328
>     #GresTypes=
>     #GroupUpdateForce=0
>     #GroupUpdateTime=600
>     #JobFileAppend=0
>     #JobRequeue=1
>     #JobSubmitPlugins=lua
>     #KillOnBadExit=0
>     #LaunchType=launch/slurm
>     #Licenses=foo*4,bar
>     #MailProg=/bin/mail
>     #MaxJobCount=10000
>     #MaxStepCount=40000
>     #MaxTasksPerNode=512
>     MpiDefault=none
>     #MpiParams=ports=#-#
>     #PluginDir=
>     #PlugStackConfig=
>     #PrivateData=jobs
>     ProctrackType=proctrack/cgroup
>     #Prolog=
>     #PrologFlags=
>     #PrologSlurmctld=
>     #PropagatePrioProcess=0
>     #PropagateResourceLimits=
>     #PropagateResourceLimitsExcept=
>     #RebootProgram=
>     ReturnToService=1
>     SlurmctldPidFile=/var/run/slurmctld.pid
>     SlurmctldPort=6817
>     SlurmdPidFile=/var/run/slurmd.pid
>     SlurmdPort=6818
>     SlurmdSpoolDir=/var/spool/slurmd
>     SlurmUser=nousheen
>     #SlurmdUser=root
>     #SrunEpilog=
>     #SrunProlog=
>     StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
>     SwitchType=switch/none
>     #TaskEpilog=
>     TaskPlugin=task/affinity
>     #TaskProlog=
>     #TopologyPlugin=topology/tree
>     #TmpFS=/tmp
>     #TrackWCKey=no
>     #TreeWidth=
>     #UnkillableStepProgram=
>     #UsePAM=0
>     #
>     #
>     # TIMERS
>     #BatchStartTimeout=10
>     #CompleteWait=0
>     #EpilogMsgTime=2000
>     #GetEnvTimeout=2
>     #HealthCheckInterval=0
>     #HealthCheckProgram=
>     InactiveLimit=0
>     KillWait=30
>     #MessageTimeout=10
>     #ResvOverRun=0
>     MinJobAge=300
>     #OverTimeLimit=0
>     SlurmctldTimeout=120
>     SlurmdTimeout=300
>     #UnkillableStepTimeout=60
>     #VSizeFactor=0
>     Waittime=0
>     #
>     #
>     # SCHEDULING
>     #DefMemPerCPU=0
>     #MaxMemPerCPU=0
>     #SchedulerTimeSlice=30
>     SchedulerType=sched/backfill
>     SelectType=select/cons_tres
>     SelectTypeParameters=CR_Core
>     #
>     #
>     # JOB PRIORITY
>     #PriorityFlags=
>     #PriorityType=priority/basic
>     #PriorityDecayHalfLife=
>     #PriorityCalcPeriod=
>     #PriorityFavorSmall=
>     #PriorityMaxAge=
>     #PriorityUsageResetPeriod=
>     #PriorityWeightAge=
>     #PriorityWeightFairshare=
>     #PriorityWeightJobSize=
>     #PriorityWeightPartition=
>     #PriorityWeightQOS=
>     #
>     #
>     # LOGGING AND ACCOUNTING
>     #AccountingStorageEnforce=0
>     #AccountingStorageHost=
>     #AccountingStoragePass=
>     #AccountingStoragePort=
>     AccountingStorageType=accounting_storage/none
>     #AccountingStorageUser=
>     #AccountingStoreFlags=
>     #JobCompHost=
>     #JobCompLoc=
>     #JobCompPass=
>     #JobCompPort=
>     JobCompType=jobcomp/none
>     #JobCompUser=
>     #JobContainerType=job_container/none
>     JobAcctGatherFrequency=30
>     JobAcctGatherType=jobacct_gather/none
>     SlurmctldDebug=info
>     SlurmctldLogFile=/var/log/slurmctld.log
>     SlurmdDebug=info
>     SlurmdLogFile=/var/log/slurmd.log
>     #SlurmSchedLogFile=
>     #SlurmSchedLogLevel=
>     #DebugFlags=
>     #
>     #
>     # POWER SAVE SUPPORT FOR IDLE NODES (optional)
>     #SuspendProgram=
>     #ResumeProgram=
>     #SuspendTimeout=
>     #ResumeTimeout=
>     #ResumeRate=
>     #SuspendExcNodes=
>     #SuspendExcParts=
>     #SuspendRate=
>     #SuspendTime=
>     #
>     #
>     # COMPUTE NODES
>     NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
> 
>     PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP ____
> 
>     __ __
> 
> 
>     ____
> 
>     Best Regards,____
> 
>     Nousheen Parvaiz____
> 
>     ᐧ____
> 



More information about the slurm-users mailing list