slurmctld runs as the user slurm, whereas slurmd runs as root.
Make sure the permissions on /app/slurm-24.0.8/lib/slurm allow the user slurm to read the files
e.g. you could do (as root)
sudo -u slurm ls /app/slurm-24.0.8/lib/slurm
and see if the slurm user can read the directory (as well as the libraries within it)
Sean
External email: Please exercise caution
Hello Slurm Folks,
I have a weird issue where on the same server, which acts as both a controller and a node, slurmctld can’t find cred_munge.so
slurmctld: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so
slurmctld: debug4: /app/slurm-24.0.8/lib/slurm/cred_munge.so: Does not exist or not a regular file.
slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files
slurmctld: error: cannot open plugin directory /app/slurm-24.0.8/lib/slurm
slurmctld: error: cannot find cred plugin for cred/munge
slurmctld: error: cannot create cred context for cred/munge
slurmctld: fatal: failed to initialize cred plugin
But slurmd can:
slurmd: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so
slurmd: debug3: plugin_load_from_file->_verify_syms: found Slurm plugin name:Munge credential signature plugin type:cred/munge version:0x180800
slurmd: cred/munge: init: Munge credential signature plugin loaded
slurmd: debug3: Success.
This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8
Thank you,
Jesse
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=prod-cluster
SlurmctldHost=controller
#
#MailProg=/bin/mail
#MpiDefault=
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurmctld
#SwitchType=
TaskPlugin=task/affinity,task/cgroup
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageType=
#JobAcctGatherFrequency=30
#JobAcctGatherType=
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
#
#
# COMPUTE NODES
NodeName=controller CPUs=1 State=UNKNOWN
NodeName=node CPUs=1 State=UNKNOWN
PartitionName=prod-part Nodes=ALL Default=YES MaxTime=INFINITE State=UP