[slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory

Tue Feb 1 04:56:55 UTC 2022

Dear Ole and Hermann,

I have reinstalled slurm from scratch now following this link:

The error remains the same. Kindly guide me where will i find this
cred/munge plugin. Please help me resolve this issue.

[root at exxact slurm]# slurmd -C
NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6
ThreadsPerCore=2 RealMemory=31889
UpTime=0-22:06:45
[root at exxact slurm]# systemctl enable slurmctld.service
[root at exxact slurm]# systemctl start slurmctld.service
[root at exxact slurm]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor
preset: disabled)
   Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s
ago
  Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s
$SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 27530 (code=exited, status=1/FAILURE)

Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited,
...RE
Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed
state.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed.

[root at exxact slurm]# /usr/local/sbin/slurmctld -D
slurmctld: slurmctld version 21.08.5 started on cluster cluster194
slurmctld: error: Couldn't find the specified plugin name for cred/munge
looking at all files
slurmctld: error: cannot find cred plugin for cred/munge
slurmctld: error: cannot create cred context for cred/munge
slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not
permitted

Best Regards,
Nousheen Parvaiz
ᐧ

On Tue, Feb 1, 2022 at 9:06 AM Nousheen <nousheenparvaiz at gmail.com> wrote:

> Dear Ole,
>
> Thank you for your response.
> I am doing it again using your suggested link.
>
> Best Regards,
> Nousheen Parvaiz
>
>
> ᐧ
>
> On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen <
> Ole.H.Nielsen at fysik.dtu.dk> wrote:
>
>> Hi Nousheen,
>>
>> I recommend you again to follow the steps for installing Slurm on a
>> CentOS
>> 7 cluster:
>> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
>>
>> Maybe you will need to start installation from scratch, but the steps are
>> guaranteed to work if followed correctly.
>>
>> IHTH,
>> Ole
>>
>> On 1/31/22 06:23, Nousheen wrote:
>> > The same error shows up on compute node which is as follows:
>> >
>> > [root at c103008 ~]# systemctl enable slurmd.service
>> > [root at c103008 ~]# systemctl start slurmd.service
>> > [root at c103008 ~]# systemctl status slurmd.service
>> > ● slurmd.service - Slurm node daemon
>> >     Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor
>> > preset: disabled)
>> >     Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42
>> EST;
>> > 2s ago
>> >    Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s
>> $SLURMD_OPTIONS
>> > (code=exited, status=203/EXEC)
>> >   Main PID: 11505 (code=exited, status=203/EXEC)
>> >
>> > Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
>> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process
>> exited,
>> > code=exited, status=203/EXEC
>> > Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed
>> state.
>> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
>> >
>> >
>> > Best Regards,
>> > Nousheen Parvaiz
>> >
>> >
>> > ᐧ
>> >
>> > On Mon, Jan 31, 2022 at 10:08 AM Nousheen <nousheenparvaiz at gmail.com
>> > <mailto:nousheenparvaiz at gmail.com>> wrote:
>> >
>> >     Dear Jeffrey,
>> >
>> >     Thank you for your response. I have followed the steps as
>> instructed.
>> >     After the copying the files to their respective locations "systemctl
>> >     status slurmctld.service" command gives me an error as follows:
>> >
>> >     (base) [nousheen at exxact system]$ systemctl daemon-reload
>> >     (base) [nousheen at exxact system]$ systemctl enable slurmctld.service
>> >     (base) [nousheen at exxact system]$ systemctl start slurmctld.service
>> >     (base) [nousheen at exxact system]$ systemctl status slurmctld.service
>> >     ● slurmctld.service - Slurm controller daemon
>> >         Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
>> >     vendor preset: disabled)
>> >         Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
>> >     PKT; 3s ago
>> >        Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
>> >     $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
>> >       Main PID: 18114 (code=exited, status=1/FAILURE)
>> >
>> >     Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
>> >     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
>> >     exited, code=exited, status=1/FAILURE
>> >     Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered
>> >     failed state.
>> >     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
>> >
>> >     Kindly guide me. Thank you so much for your time.
>> >
>> >     Best Regards,
>> >     Nousheen Parvaiz
>> >
>> >     ᐧ
>> >
>> >     On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRLang at uwyo.edu
>> >     <mailto:JRLang at uwyo.edu>> wrote:
>> >
>> >         The missing file error has nothing to do with slurm.  The
>> >         systemctl command is part of the systems service management.____
>> >
>> >         __ __
>> >
>> >         The error message indicates that you haven’t copied the
>> >         slurmd.service file on your compute node to /etc/systemd/system
>> or
>> >         /usr/lib/systemd/system.  /etc/systemd/system is usually used
>> when
>> >         a user adds a new service to a machine.____
>> >
>> >         __ __
>> >
>> >         Depending on your version of Linux you may also need to do a
>> >         systemctl daemon-reload to activate the slurmd.service within
>> >         system.____
>> >
>> >         __ __
>> >
>> >         Once slurmd.service is copied over, the systemctld command
>> should
>> >         work just fine.____
>> >
>> >         __ __
>> >
>> >         Remember:____
>> >
>> >                          slurmd.service     -  Only on compute nodes____
>> >
>> >                          slurmctld.service – Only on your cluster
>> >         management node____
>> >
>> >                        slurmdbd.service – Only on your cluster
>> management
>> >         node____
>> >
>> >         __ __
>> >
>> >         *From:* slurm-users <slurm-users-bounces at lists.schedmd.com
>> >         <mailto:slurm-users-bounces at lists.schedmd.com>> *On Behalf Of
>> >         *Nousheen
>> >         *Sent:* Thursday, January 27, 2022 3:54 AM
>> >         *To:* Slurm User Community List <slurm-users at lists.schedmd.com
>> >         <mailto:slurm-users at lists.schedmd.com>>
>> >         *Subject:* [slurm-users] systemctl enable slurmd.service Failed
>> to
>> >         execute operation: No such file or directory____
>> >
>> >         __ __
>> >
>> >         ◆ This message was sent from a non-UWYO address. Please exercise
>> >         caution when clicking links or opening attachments from external
>> >         sources.____
>> >
>> >         __ __
>> >
>> >         __ __
>> >
>> >         Hello everyone,____
>> >
>> >         __ __
>> >
>> >         I am installing slurm on Centos 7 following tutorial:
>> >
>> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>> >         <
>> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>> >____
>> >
>> >         __ __
>> >
>> >         I am at the step where we start slurm but it gives me the
>> >         following error:____
>> >
>> >         __ __
>> >
>> >         [root at exxact slurm-21.08.5]# systemctl enable
>> slurmd.service____
>> >
>> >         Failed to execute operation: No such file or directory____
>> >
>> >         __ __
>> >
>> >         I have run the command to check if slurm is configured
>> properly____
>> >
>> >         __ __
>> >
>> >         [root at exxact slurm-21.08.5]# slurmd -C
>> >         NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
>> >         CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
>> >         UpTime=19-16:06:00____
>> >
>> >         __ __
>> >
>> >         I am new to this and unable to understand the problem. Kindly
>> help
>> >         me resolve this.____
>> >
>> >         __ __
>> >
>> >         My slurm.conf file is as follows:____
>> >
>> >         __ __
>> >
>> >         # slurm.conf file generated by configurator.html.
>> >         # Put this file on all nodes of your cluster.
>> >         # See the slurm.conf man page for more information.
>> >         #
>> >         ClusterName=cluster194
>> >         SlurmctldHost=192.168.60.194
>> >         #SlurmctldHost=
>> >         #
>> >         #DisableRootJobs=NO
>> >         #EnforcePartLimits=NO
>> >         #Epilog=
>> >         #EpilogSlurmctld=
>> >         #FirstJobId=1
>> >         #MaxJobId=67043328
>> >         #GresTypes=
>> >         #GroupUpdateForce=0
>> >         #GroupUpdateTime=600
>> >         #JobFileAppend=0
>> >         #JobRequeue=1
>> >         #JobSubmitPlugins=lua
>> >         #KillOnBadExit=0
>> >         #LaunchType=launch/slurm
>> >         #Licenses=foo*4,bar
>> >         #MailProg=/bin/mail
>> >         #MaxJobCount=10000
>> >         #MaxStepCount=40000
>> >         #MaxTasksPerNode=512
>> >         MpiDefault=none
>> >         #MpiParams=ports=#-#
>> >         #PluginDir=
>> >         #PlugStackConfig=
>> >         #PrivateData=jobs
>> >         ProctrackType=proctrack/cgroup
>> >         #Prolog=
>> >         #PrologFlags=
>> >         #PrologSlurmctld=
>> >         #PropagatePrioProcess=0
>> >         #PropagateResourceLimits=
>> >         #PropagateResourceLimitsExcept=
>> >         #RebootProgram=
>> >         ReturnToService=1
>> >         SlurmctldPidFile=/var/run/slurmctld.pid
>> >         SlurmctldPort=6817
>> >         SlurmdPidFile=/var/run/slurmd.pid
>> >         SlurmdPort=6818
>> >         SlurmdSpoolDir=/var/spool/slurmd
>> >         SlurmUser=nousheen
>> >         #SlurmdUser=root
>> >         #SrunEpilog=
>> >         #SrunProlog=
>> >
>>  StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
>> >         SwitchType=switch/none
>> >         #TaskEpilog=
>> >         TaskPlugin=task/affinity
>> >         #TaskProlog=
>> >         #TopologyPlugin=topology/tree
>> >         #TmpFS=/tmp
>> >         #TrackWCKey=no
>> >         #TreeWidth=
>> >         #UnkillableStepProgram=
>> >         #UsePAM=0
>> >         #
>> >         #
>> >         # TIMERS
>> >         #BatchStartTimeout=10
>> >         #CompleteWait=0
>> >         #EpilogMsgTime=2000
>> >         #GetEnvTimeout=2
>> >         #HealthCheckInterval=0
>> >         #HealthCheckProgram=
>> >         InactiveLimit=0
>> >         KillWait=30
>> >         #MessageTimeout=10
>> >         #ResvOverRun=0
>> >         MinJobAge=300
>> >         #OverTimeLimit=0
>> >         SlurmctldTimeout=120
>> >         SlurmdTimeout=300
>> >         #UnkillableStepTimeout=60
>> >         #VSizeFactor=0
>> >         Waittime=0
>> >         #
>> >         #
>> >         # SCHEDULING
>> >         #DefMemPerCPU=0
>> >         #MaxMemPerCPU=0
>> >         #SchedulerTimeSlice=30
>> >         SchedulerType=sched/backfill
>> >         SelectType=select/cons_tres
>> >         SelectTypeParameters=CR_Core
>> >         #
>> >         #
>> >         # JOB PRIORITY
>> >         #PriorityFlags=
>> >         #PriorityType=priority/basic
>> >         #PriorityDecayHalfLife=
>> >         #PriorityCalcPeriod=
>> >         #PriorityFavorSmall=
>> >         #PriorityMaxAge=
>> >         #PriorityUsageResetPeriod=
>> >         #PriorityWeightAge=
>> >         #PriorityWeightFairshare=
>> >         #PriorityWeightJobSize=
>> >         #PriorityWeightPartition=
>> >         #PriorityWeightQOS=
>> >         #
>> >         #
>> >         # LOGGING AND ACCOUNTING
>> >         #AccountingStorageEnforce=0
>> >         #AccountingStorageHost=
>> >         #AccountingStoragePass=
>> >         #AccountingStoragePort=
>> >         AccountingStorageType=accounting_storage/none
>> >         #AccountingStorageUser=
>> >         #AccountingStoreFlags=
>> >         #JobCompHost=
>> >         #JobCompLoc=
>> >         #JobCompPass=
>> >         #JobCompPort=
>> >         JobCompType=jobcomp/none
>> >         #JobCompUser=
>> >         #JobContainerType=job_container/none
>> >         JobAcctGatherFrequency=30
>> >         JobAcctGatherType=jobacct_gather/none
>> >         SlurmctldDebug=info
>> >         SlurmctldLogFile=/var/log/slurmctld.log
>> >         SlurmdDebug=info
>> >         SlurmdLogFile=/var/log/slurmd.log
>> >         #SlurmSchedLogFile=
>> >         #SlurmSchedLogLevel=
>> >         #DebugFlags=
>> >         #
>> >         #
>> >         # POWER SAVE SUPPORT FOR IDLE NODES (optional)
>> >         #SuspendProgram=
>> >         #ResumeProgram=
>> >         #SuspendTimeout=
>> >         #ResumeTimeout=
>> >         #ResumeRate=
>> >         #SuspendExcNodes=
>> >         #SuspendExcParts=
>> >         #SuspendRate=
>> >         #SuspendTime=
>> >         #
>> >         #
>> >         # COMPUTE NODES
>> >         NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
>> >
>> >         PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE
>> >         State=UP ____
>> >
>> >         __ __
>> >
>> >
>> >         ____
>> >
>> >         Best Regards,____
>> >
>> >         Nousheen Parvaiz____
>> >
>> >         ᐧ____
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220201/3868e726/attachment-0001.htm>