[slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory
Nousheen
nousheenparvaiz at gmail.com
Tue Feb 1 04:56:55 UTC 2022
Dear Ole and Hermann,
I have reinstalled slurm from scratch now following this link:
The error remains the same. Kindly guide me where will i find this
cred/munge plugin. Please help me resolve this issue.
[root at exxact slurm]# slurmd -C
NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6
ThreadsPerCore=2 RealMemory=31889
UpTime=0-22:06:45
[root at exxact slurm]# systemctl enable slurmctld.service
[root at exxact slurm]# systemctl start slurmctld.service
[root at exxact slurm]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor
preset: disabled)
Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s
ago
Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s
$SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 27530 (code=exited, status=1/FAILURE)
Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited,
...RE
Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed
state.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed.
[root at exxact slurm]# /usr/local/sbin/slurmctld -D
slurmctld: slurmctld version 21.08.5 started on cluster cluster194
slurmctld: error: Couldn't find the specified plugin name for cred/munge
looking at all files
slurmctld: error: cannot find cred plugin for cred/munge
slurmctld: error: cannot create cred context for cred/munge
slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not
permitted
Best Regards,
Nousheen Parvaiz
ᐧ
On Tue, Feb 1, 2022 at 9:06 AM Nousheen <nousheenparvaiz at gmail.com> wrote:
> Dear Ole,
>
> Thank you for your response.
> I am doing it again using your suggested link.
>
> Best Regards,
> Nousheen Parvaiz
>
>
> ᐧ
>
> On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen <
> Ole.H.Nielsen at fysik.dtu.dk> wrote:
>
>> Hi Nousheen,
>>
>> I recommend you again to follow the steps for installing Slurm on a
>> CentOS
>> 7 cluster:
>> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
>>
>> Maybe you will need to start installation from scratch, but the steps are
>> guaranteed to work if followed correctly.
>>
>> IHTH,
>> Ole
>>
>> On 1/31/22 06:23, Nousheen wrote:
>> > The same error shows up on compute node which is as follows:
>> >
>> > [root at c103008 ~]# systemctl enable slurmd.service
>> > [root at c103008 ~]# systemctl start slurmd.service
>> > [root at c103008 ~]# systemctl status slurmd.service
>> > ● slurmd.service - Slurm node daemon
>> > Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor
>> > preset: disabled)
>> > Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42
>> EST;
>> > 2s ago
>> > Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s
>> $SLURMD_OPTIONS
>> > (code=exited, status=203/EXEC)
>> > Main PID: 11505 (code=exited, status=203/EXEC)
>> >
>> > Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
>> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process
>> exited,
>> > code=exited, status=203/EXEC
>> > Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed
>> state.
>> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
>> >
>> >
>> > Best Regards,
>> > Nousheen Parvaiz
>> >
>> >
>> > ᐧ
>> >
>> > On Mon, Jan 31, 2022 at 10:08 AM Nousheen <nousheenparvaiz at gmail.com
>> > <mailto:nousheenparvaiz at gmail.com>> wrote:
>> >
>> > Dear Jeffrey,
>> >
>> > Thank you for your response. I have followed the steps as
>> instructed.
>> > After the copying the files to their respective locations "systemctl
>> > status slurmctld.service" command gives me an error as follows:
>> >
>> > (base) [nousheen at exxact system]$ systemctl daemon-reload
>> > (base) [nousheen at exxact system]$ systemctl enable slurmctld.service
>> > (base) [nousheen at exxact system]$ systemctl start slurmctld.service
>> > (base) [nousheen at exxact system]$ systemctl status slurmctld.service
>> > ● slurmctld.service - Slurm controller daemon
>> > Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
>> > vendor preset: disabled)
>> > Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
>> > PKT; 3s ago
>> > Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
>> > $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
>> > Main PID: 18114 (code=exited, status=1/FAILURE)
>> >
>> > Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
>> > Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
>> > exited, code=exited, status=1/FAILURE
>> > Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered
>> > failed state.
>> > Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
>> >
>> > Kindly guide me. Thank you so much for your time.
>> >
>> > Best Regards,
>> > Nousheen Parvaiz
>> >
>> > ᐧ
>> >
>> > On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRLang at uwyo.edu
>> > <mailto:JRLang at uwyo.edu>> wrote:
>> >
>> > The missing file error has nothing to do with slurm. The
>> > systemctl command is part of the systems service management.____
>> >
>> > __ __
>> >
>> > The error message indicates that you haven’t copied the
>> > slurmd.service file on your compute node to /etc/systemd/system
>> or
>> > /usr/lib/systemd/system. /etc/systemd/system is usually used
>> when
>> > a user adds a new service to a machine.____
>> >
>> > __ __
>> >
>> > Depending on your version of Linux you may also need to do a
>> > systemctl daemon-reload to activate the slurmd.service within
>> > system.____
>> >
>> > __ __
>> >
>> > Once slurmd.service is copied over, the systemctld command
>> should
>> > work just fine.____
>> >
>> > __ __
>> >
>> > Remember:____
>> >
>> > slurmd.service - Only on compute nodes____
>> >
>> > slurmctld.service – Only on your cluster
>> > management node____
>> >
>> > slurmdbd.service – Only on your cluster
>> management
>> > node____
>> >
>> > __ __
>> >
>> > *From:* slurm-users <slurm-users-bounces at lists.schedmd.com
>> > <mailto:slurm-users-bounces at lists.schedmd.com>> *On Behalf Of
>> > *Nousheen
>> > *Sent:* Thursday, January 27, 2022 3:54 AM
>> > *To:* Slurm User Community List <slurm-users at lists.schedmd.com
>> > <mailto:slurm-users at lists.schedmd.com>>
>> > *Subject:* [slurm-users] systemctl enable slurmd.service Failed
>> to
>> > execute operation: No such file or directory____
>> >
>> > __ __
>> >
>> > ◆ This message was sent from a non-UWYO address. Please exercise
>> > caution when clicking links or opening attachments from external
>> > sources.____
>> >
>> > __ __
>> >
>> > __ __
>> >
>> > Hello everyone,____
>> >
>> > __ __
>> >
>> > I am installing slurm on Centos 7 following tutorial:
>> >
>> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>> > <
>> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>> >____
>> >
>> > __ __
>> >
>> > I am at the step where we start slurm but it gives me the
>> > following error:____
>> >
>> > __ __
>> >
>> > [root at exxact slurm-21.08.5]# systemctl enable
>> slurmd.service____
>> >
>> > Failed to execute operation: No such file or directory____
>> >
>> > __ __
>> >
>> > I have run the command to check if slurm is configured
>> properly____
>> >
>> > __ __
>> >
>> > [root at exxact slurm-21.08.5]# slurmd -C
>> > NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
>> > CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
>> > UpTime=19-16:06:00____
>> >
>> > __ __
>> >
>> > I am new to this and unable to understand the problem. Kindly
>> help
>> > me resolve this.____
>> >
>> > __ __
>> >
>> > My slurm.conf file is as follows:____
>> >
>> > __ __
>> >
>> > # slurm.conf file generated by configurator.html.
>> > # Put this file on all nodes of your cluster.
>> > # See the slurm.conf man page for more information.
>> > #
>> > ClusterName=cluster194
>> > SlurmctldHost=192.168.60.194
>> > #SlurmctldHost=
>> > #
>> > #DisableRootJobs=NO
>> > #EnforcePartLimits=NO
>> > #Epilog=
>> > #EpilogSlurmctld=
>> > #FirstJobId=1
>> > #MaxJobId=67043328
>> > #GresTypes=
>> > #GroupUpdateForce=0
>> > #GroupUpdateTime=600
>> > #JobFileAppend=0
>> > #JobRequeue=1
>> > #JobSubmitPlugins=lua
>> > #KillOnBadExit=0
>> > #LaunchType=launch/slurm
>> > #Licenses=foo*4,bar
>> > #MailProg=/bin/mail
>> > #MaxJobCount=10000
>> > #MaxStepCount=40000
>> > #MaxTasksPerNode=512
>> > MpiDefault=none
>> > #MpiParams=ports=#-#
>> > #PluginDir=
>> > #PlugStackConfig=
>> > #PrivateData=jobs
>> > ProctrackType=proctrack/cgroup
>> > #Prolog=
>> > #PrologFlags=
>> > #PrologSlurmctld=
>> > #PropagatePrioProcess=0
>> > #PropagateResourceLimits=
>> > #PropagateResourceLimitsExcept=
>> > #RebootProgram=
>> > ReturnToService=1
>> > SlurmctldPidFile=/var/run/slurmctld.pid
>> > SlurmctldPort=6817
>> > SlurmdPidFile=/var/run/slurmd.pid
>> > SlurmdPort=6818
>> > SlurmdSpoolDir=/var/spool/slurmd
>> > SlurmUser=nousheen
>> > #SlurmdUser=root
>> > #SrunEpilog=
>> > #SrunProlog=
>> >
>> StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
>> > SwitchType=switch/none
>> > #TaskEpilog=
>> > TaskPlugin=task/affinity
>> > #TaskProlog=
>> > #TopologyPlugin=topology/tree
>> > #TmpFS=/tmp
>> > #TrackWCKey=no
>> > #TreeWidth=
>> > #UnkillableStepProgram=
>> > #UsePAM=0
>> > #
>> > #
>> > # TIMERS
>> > #BatchStartTimeout=10
>> > #CompleteWait=0
>> > #EpilogMsgTime=2000
>> > #GetEnvTimeout=2
>> > #HealthCheckInterval=0
>> > #HealthCheckProgram=
>> > InactiveLimit=0
>> > KillWait=30
>> > #MessageTimeout=10
>> > #ResvOverRun=0
>> > MinJobAge=300
>> > #OverTimeLimit=0
>> > SlurmctldTimeout=120
>> > SlurmdTimeout=300
>> > #UnkillableStepTimeout=60
>> > #VSizeFactor=0
>> > Waittime=0
>> > #
>> > #
>> > # SCHEDULING
>> > #DefMemPerCPU=0
>> > #MaxMemPerCPU=0
>> > #SchedulerTimeSlice=30
>> > SchedulerType=sched/backfill
>> > SelectType=select/cons_tres
>> > SelectTypeParameters=CR_Core
>> > #
>> > #
>> > # JOB PRIORITY
>> > #PriorityFlags=
>> > #PriorityType=priority/basic
>> > #PriorityDecayHalfLife=
>> > #PriorityCalcPeriod=
>> > #PriorityFavorSmall=
>> > #PriorityMaxAge=
>> > #PriorityUsageResetPeriod=
>> > #PriorityWeightAge=
>> > #PriorityWeightFairshare=
>> > #PriorityWeightJobSize=
>> > #PriorityWeightPartition=
>> > #PriorityWeightQOS=
>> > #
>> > #
>> > # LOGGING AND ACCOUNTING
>> > #AccountingStorageEnforce=0
>> > #AccountingStorageHost=
>> > #AccountingStoragePass=
>> > #AccountingStoragePort=
>> > AccountingStorageType=accounting_storage/none
>> > #AccountingStorageUser=
>> > #AccountingStoreFlags=
>> > #JobCompHost=
>> > #JobCompLoc=
>> > #JobCompPass=
>> > #JobCompPort=
>> > JobCompType=jobcomp/none
>> > #JobCompUser=
>> > #JobContainerType=job_container/none
>> > JobAcctGatherFrequency=30
>> > JobAcctGatherType=jobacct_gather/none
>> > SlurmctldDebug=info
>> > SlurmctldLogFile=/var/log/slurmctld.log
>> > SlurmdDebug=info
>> > SlurmdLogFile=/var/log/slurmd.log
>> > #SlurmSchedLogFile=
>> > #SlurmSchedLogLevel=
>> > #DebugFlags=
>> > #
>> > #
>> > # POWER SAVE SUPPORT FOR IDLE NODES (optional)
>> > #SuspendProgram=
>> > #ResumeProgram=
>> > #SuspendTimeout=
>> > #ResumeTimeout=
>> > #ResumeRate=
>> > #SuspendExcNodes=
>> > #SuspendExcParts=
>> > #SuspendRate=
>> > #SuspendTime=
>> > #
>> > #
>> > # COMPUTE NODES
>> > NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
>> >
>> > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE
>> > State=UP ____
>> >
>> > __ __
>> >
>> >
>> > ____
>> >
>> > Best Regards,____
>> >
>> > Nousheen Parvaiz____
>> >
>> > ᐧ____
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220201/3868e726/attachment-0001.htm>
More information about the slurm-users
mailing list