[slurm-users] [EXT] Re: systemctl enable slurmd.service Failed to execute operation: No such file or directory
Sean Crosby
scrosby at unimelb.edu.au
Tue Feb 1 07:29:15 UTC 2022
Did you build Slurm yourself from source? If so, when you build from source, on that node, you need to have the munge-devel package installed (munge-devel on EL systems, libmunge-dev on Debian)
You then need to set up munge with a shared munge key between the nodes, and have the munge daemon running.
This is all detailed on Ole's wiki which was linked previously - https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
Sean
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Nousheen <nousheenparvaiz at gmail.com>
Sent: Tuesday, 1 February 2022 15:56
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [EXT] Re: [slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory
External email: Please exercise caution
________________________________
Dear Ole and Hermann,
I have reinstalled slurm from scratch now following this link:
The error remains the same. Kindly guide me where will i find this cred/munge plugin. Please help me resolve this issue.
[root at exxact slurm]# slurmd -C
NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
UpTime=0-22:06:45
[root at exxact slurm]# systemctl enable slurmctld.service
[root at exxact slurm]# systemctl start slurmctld.service
[root at exxact slurm]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s ago
Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 27530 (code=exited, status=1/FAILURE)
Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited, ...RE
Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed state.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed.
[root at exxact slurm]# /usr/local/sbin/slurmctld -D
slurmctld: slurmctld version 21.08.5 started on cluster cluster194
slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files
slurmctld: error: cannot find cred plugin for cred/munge
slurmctld: error: cannot create cred context for cred/munge
slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted
Best Regards,
Nousheen Parvaiz
[https://mailfoogae.appspot.com/t?sender=abm91c2hlZW5wYXJ2YWl6QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=7435e410-9fbe-4cc6-acf8-889877b5c100]ᐧ
On Tue, Feb 1, 2022 at 9:06 AM Nousheen <nousheenparvaiz at gmail.com<mailto:nousheenparvaiz at gmail.com>> wrote:
Dear Ole,
Thank you for your response.
I am doing it again using your suggested link.
Best Regards,
Nousheen Parvaiz
[https://mailfoogae.appspot.com/t?sender=abm91c2hlZW5wYXJ2YWl6QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=9a84a710-e6c1-4912-a461-6103eb630f96]ᐧ
On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk<mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
Hi Nousheen,
I recommend you again to follow the steps for installing Slurm on a CentOS
7 cluster:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
Maybe you will need to start installation from scratch, but the steps are
guaranteed to work if followed correctly.
IHTH,
Ole
On 1/31/22 06:23, Nousheen wrote:
> The same error shows up on compute node which is as follows:
>
> [root at c103008 ~]# systemctl enable slurmd.service
> [root at c103008 ~]# systemctl start slurmd.service
> [root at c103008 ~]# systemctl status slurmd.service
> ● slurmd.service - Slurm node daemon
> Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor
> preset: disabled)
> Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 EST;
> 2s ago
> Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS
> (code=exited, status=203/EXEC)
> Main PID: 11505 (code=exited, status=203/EXEC)
>
> Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited,
> code=exited, status=203/EXEC
> Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed state.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
>
>
> Best Regards,
> Nousheen Parvaiz
>
>
> ᐧ
>
> On Mon, Jan 31, 2022 at 10:08 AM Nousheen <nousheenparvaiz at gmail.com<mailto:nousheenparvaiz at gmail.com>
> <mailto:nousheenparvaiz at gmail.com<mailto:nousheenparvaiz at gmail.com>>> wrote:
>
> Dear Jeffrey,
>
> Thank you for your response. I have followed the steps as instructed.
> After the copying the files to their respective locations "systemctl
> status slurmctld.service" command gives me an error as follows:
>
> (base) [nousheen at exxact system]$ systemctl daemon-reload
> (base) [nousheen at exxact system]$ systemctl enable slurmctld.service
> (base) [nousheen at exxact system]$ systemctl start slurmctld.service
> (base) [nousheen at exxact system]$ systemctl status slurmctld.service
> ● slurmctld.service - Slurm controller daemon
> Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
> vendor preset: disabled)
> Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
> PKT; 3s ago
> Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
> $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
> Main PID: 18114 (code=exited, status=1/FAILURE)
>
> Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
> Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
> exited, code=exited, status=1/FAILURE
> Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered
> failed state.
> Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
>
> Kindly guide me. Thank you so much for your time.
>
> Best Regards,
> Nousheen Parvaiz
>
> ᐧ
>
> On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRLang at uwyo.edu<mailto:JRLang at uwyo.edu>
> <mailto:JRLang at uwyo.edu<mailto:JRLang at uwyo.edu>>> wrote:
>
> The missing file error has nothing to do with slurm. The
> systemctl command is part of the systems service management.____
>
> __ __
>
> The error message indicates that you haven’t copied the
> slurmd.service file on your compute node to /etc/systemd/system or
> /usr/lib/systemd/system. /etc/systemd/system is usually used when
> a user adds a new service to a machine.____
>
> __ __
>
> Depending on your version of Linux you may also need to do a
> systemctl daemon-reload to activate the slurmd.service within
> system.____
>
> __ __
>
> Once slurmd.service is copied over, the systemctld command should
> work just fine.____
>
> __ __
>
> Remember:____
>
> slurmd.service - Only on compute nodes____
>
> slurmctld.service – Only on your cluster
> management node____
>
> slurmdbd.service – Only on your cluster management
> node____
>
> __ __
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>
> <mailto:slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>>> *On Behalf Of
> *Nousheen
> *Sent:* Thursday, January 27, 2022 3:54 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>
> <mailto:slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>>
> *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
> execute operation: No such file or directory____
>
> __ __
>
> ◆ This message was sent from a non-UWYO address. Please exercise
> caution when clicking links or opening attachments from external
> sources.____
>
> __ __
>
> __ __
>
> Hello everyone,____
>
> __ __
>
> I am installing slurm on Centos 7 following tutorial:
> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
> <https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____
>
> __ __
>
> I am at the step where we start slurm but it gives me the
> following error:____
>
> __ __
>
> [root at exxact slurm-21.08.5]# systemctl enable slurmd.service____
>
> Failed to execute operation: No such file or directory____
>
> __ __
>
> I have run the command to check if slurm is configured properly____
>
> __ __
>
> [root at exxact slurm-21.08.5]# slurmd -C
> NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
> CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
> UpTime=19-16:06:00____
>
> __ __
>
> I am new to this and unable to understand the problem. Kindly help
> me resolve this.____
>
> __ __
>
> My slurm.conf file is as follows:____
>
> __ __
>
> # slurm.conf file generated by configurator.html.
> # Put this file on all nodes of your cluster.
> # See the slurm.conf man page for more information.
> #
> ClusterName=cluster194
> SlurmctldHost=192.168.60.194
> #SlurmctldHost=
> #
> #DisableRootJobs=NO
> #EnforcePartLimits=NO
> #Epilog=
> #EpilogSlurmctld=
> #FirstJobId=1
> #MaxJobId=67043328
> #GresTypes=
> #GroupUpdateForce=0
> #GroupUpdateTime=600
> #JobFileAppend=0
> #JobRequeue=1
> #JobSubmitPlugins=lua
> #KillOnBadExit=0
> #LaunchType=launch/slurm
> #Licenses=foo*4,bar
> #MailProg=/bin/mail
> #MaxJobCount=10000
> #MaxStepCount=40000
> #MaxTasksPerNode=512
> MpiDefault=none
> #MpiParams=ports=#-#
> #PluginDir=
> #PlugStackConfig=
> #PrivateData=jobs
> ProctrackType=proctrack/cgroup
> #Prolog=
> #PrologFlags=
> #PrologSlurmctld=
> #PropagatePrioProcess=0
> #PropagateResourceLimits=
> #PropagateResourceLimitsExcept=
> #RebootProgram=
> ReturnToService=1
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/spool/slurmd
> SlurmUser=nousheen
> #SlurmdUser=root
> #SrunEpilog=
> #SrunProlog=
> StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
> SwitchType=switch/none
> #TaskEpilog=
> TaskPlugin=task/affinity
> #TaskProlog=
> #TopologyPlugin=topology/tree
> #TmpFS=/tmp
> #TrackWCKey=no
> #TreeWidth=
> #UnkillableStepProgram=
> #UsePAM=0
> #
> #
> # TIMERS
> #BatchStartTimeout=10
> #CompleteWait=0
> #EpilogMsgTime=2000
> #GetEnvTimeout=2
> #HealthCheckInterval=0
> #HealthCheckProgram=
> InactiveLimit=0
> KillWait=30
> #MessageTimeout=10
> #ResvOverRun=0
> MinJobAge=300
> #OverTimeLimit=0
> SlurmctldTimeout=120
> SlurmdTimeout=300
> #UnkillableStepTimeout=60
> #VSizeFactor=0
> Waittime=0
> #
> #
> # SCHEDULING
> #DefMemPerCPU=0
> #MaxMemPerCPU=0
> #SchedulerTimeSlice=30
> SchedulerType=sched/backfill
> SelectType=select/cons_tres
> SelectTypeParameters=CR_Core
> #
> #
> # JOB PRIORITY
> #PriorityFlags=
> #PriorityType=priority/basic
> #PriorityDecayHalfLife=
> #PriorityCalcPeriod=
> #PriorityFavorSmall=
> #PriorityMaxAge=
> #PriorityUsageResetPeriod=
> #PriorityWeightAge=
> #PriorityWeightFairshare=
> #PriorityWeightJobSize=
> #PriorityWeightPartition=
> #PriorityWeightQOS=
> #
> #
> # LOGGING AND ACCOUNTING
> #AccountingStorageEnforce=0
> #AccountingStorageHost=
> #AccountingStoragePass=
> #AccountingStoragePort=
> AccountingStorageType=accounting_storage/none
> #AccountingStorageUser=
> #AccountingStoreFlags=
> #JobCompHost=
> #JobCompLoc=
> #JobCompPass=
> #JobCompPort=
> JobCompType=jobcomp/none
> #JobCompUser=
> #JobContainerType=job_container/none
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/none
> SlurmctldDebug=info
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=info
> SlurmdLogFile=/var/log/slurmd.log
> #SlurmSchedLogFile=
> #SlurmSchedLogLevel=
> #DebugFlags=
> #
> #
> # POWER SAVE SUPPORT FOR IDLE NODES (optional)
> #SuspendProgram=
> #ResumeProgram=
> #SuspendTimeout=
> #ResumeTimeout=
> #ResumeRate=
> #SuspendExcNodes=
> #SuspendExcParts=
> #SuspendRate=
> #SuspendTime=
> #
> #
> # COMPUTE NODES
> NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
>
> PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE
> State=UP ____
>
> __ __
>
>
> ____
>
> Best Regards,____
>
> Nousheen Parvaiz____
>
> ᐧ____
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220201/94a70ac2/attachment-0001.htm>
More information about the slurm-users
mailing list