Hello Slurm Folks,
I have a weird issue where on the same server, which acts as both a controller and a node, slurmctld can’t find cred_munge.so
slurmctld: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmctld: debug4: /app/slurm-24.0.8/lib/slurm/cred_munge.so: Does not exist or not a regular file. slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files slurmctld: error: cannot open plugin directory /app/slurm-24.0.8/lib/slurm slurmctld: error: cannot find cred plugin for cred/munge slurmctld: error: cannot create cred context for cred/munge slurmctld: fatal: failed to initialize cred plugin
But slurmd can:
slurmd: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmd: debug3: plugin_load_from_file->_verify_syms: found Slurm plugin name:Munge credential signature plugin type:cred/munge version:0x180800 slurmd: cred/munge: init: Munge credential signature plugin loaded slurmd: debug3: Success.
This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8
Thank you,
Jesse
# slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=prod-cluster SlurmctldHost=controller # #MailProg=/bin/mail #MpiDefault= #MpiParams=ports=#-# ProctrackType=proctrack/cgroup ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld #SwitchType= TaskPlugin=task/affinity,task/cgroup # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres # # # LOGGING AND ACCOUNTING #AccountingStorageType= #JobAcctGatherFrequency=30 #JobAcctGatherType= #SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES NodeName=controller CPUs=1 State=UNKNOWN NodeName=node CPUs=1 State=UNKNOWN PartitionName=prod-part Nodes=ALL Default=YES MaxTime=INFINITE State=UP