error: Couldn't find the specified plugin name for cred/munge looking at all files
Hello Slurm Folks, I have a weird issue where on the same server, which acts as both a controller and a node, slurmctld can’t find cred_munge.so slurmctld: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmctld: debug4: /app/slurm-24.0.8/lib/slurm/cred_munge.so: Does not exist or not a regular file. slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files slurmctld: error: cannot open plugin directory /app/slurm-24.0.8/lib/slurm slurmctld: error: cannot find cred plugin for cred/munge slurmctld: error: cannot create cred context for cred/munge slurmctld: fatal: failed to initialize cred plugin But slurmd can: slurmd: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmd: debug3: plugin_load_from_file->_verify_syms: found Slurm plugin name:Munge credential signature plugin type:cred/munge version:0x180800 slurmd: cred/munge: init: Munge credential signature plugin loaded slurmd: debug3: Success. This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8 Thank you, Jesse # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=prod-cluster SlurmctldHost=controller # #MailProg=/bin/mail #MpiDefault= #MpiParams=ports=#-# ProctrackType=proctrack/cgroup ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld #SwitchType= TaskPlugin=task/affinity,task/cgroup # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres # # # LOGGING AND ACCOUNTING #AccountingStorageType= #JobAcctGatherFrequency=30 #JobAcctGatherType= #SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES NodeName=controller CPUs=1 State=UNKNOWN NodeName=node CPUs=1 State=UNKNOWN PartitionName=prod-part Nodes=ALL Default=YES MaxTime=INFINITE State=UP
slurmctld runs as the user slurm, whereas slurmd runs as root. Make sure the permissions on /app/slurm-24.0.8/lib/slurm allow the user slurm to read the files e.g. you could do (as root) sudo -u slurm ls /app/slurm-24.0.8/lib/slurm and see if the slurm user can read the directory (as well as the libraries within it) Sean ________________________________ From: slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Jesse Aiton <jesse@clarkeconsulting.com> Sent: Wednesday, 24 January 2024 10:14 To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: [EXT] [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files External email: Please exercise caution Hello Slurm Folks, I have a weird issue where on the same server, which acts as both a controller and a node, slurmctld can’t find cred_munge.so slurmctld: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmctld: debug4: /app/slurm-24.0.8/lib/slurm/cred_munge.so: Does not exist or not a regular file. slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files slurmctld: error: cannot open plugin directory /app/slurm-24.0.8/lib/slurm slurmctld: error: cannot find cred plugin for cred/munge slurmctld: error: cannot create cred context for cred/munge slurmctld: fatal: failed to initialize cred plugin But slurmd can: slurmd: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmd: debug3: plugin_load_from_file->_verify_syms: found Slurm plugin name:Munge credential signature plugin type:cred/munge version:0x180800 slurmd: cred/munge: init: Munge credential signature plugin loaded slurmd: debug3: Success. This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8 Thank you, Jesse # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=prod-cluster SlurmctldHost=controller # #MailProg=/bin/mail #MpiDefault= #MpiParams=ports=#-# ProctrackType=proctrack/cgroup ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld #SwitchType= TaskPlugin=task/affinity,task/cgroup # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres # # # LOGGING AND ACCOUNTING #AccountingStorageType= #JobAcctGatherFrequency=30 #JobAcctGatherType= #SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES NodeName=controller CPUs=1 State=UNKNOWN NodeName=node CPUs=1 State=UNKNOWN PartitionName=prod-part Nodes=ALL Default=YES MaxTime=INFINITE State=UP
Hi Sean, Thank you! It was a permissions issue and it’s not complaining anymore about cred/munge. I appreciate your help. Thanks, Jesse
On Jan 23, 2024, at 3:34 PM, Sean Crosby <scrosby@unimelb.edu.au> wrote:
slurmctld runs as the user slurm, whereas slurmd runs as root.
Make sure the permissions on /app/slurm-24.0.8/lib/slurm allow the user slurm to read the files
e.g. you could do (as root)
sudo -u slurm ls /app/slurm-24.0.8/lib/slurm
and see if the slurm user can read the directory (as well as the libraries within it)
Sean From: slurm-users <slurm-users-bounces@lists.schedmd.com <mailto:slurm-users-bounces@lists.schedmd.com>> on behalf of Jesse Aiton <jesse@clarkeconsulting.com <mailto:jesse@clarkeconsulting.com>> Sent: Wednesday, 24 January 2024 10:14 To: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> Subject: [EXT] [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files
External email: Please exercise caution
Hello Slurm Folks,
I have a weird issue where on the same server, which acts as both a controller and a node, slurmctld can’t find cred_munge.so
slurmctld: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmctld: debug4: /app/slurm-24.0.8/lib/slurm/cred_munge.so: Does not exist or not a regular file. slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files slurmctld: error: cannot open plugin directory /app/slurm-24.0.8/lib/slurm slurmctld: error: cannot find cred plugin for cred/munge slurmctld: error: cannot create cred context for cred/munge slurmctld: fatal: failed to initialize cred plugin
But slurmd can:
slurmd: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmd: debug3: plugin_load_from_file->_verify_syms: found Slurm plugin name:Munge credential signature plugin type:cred/munge version:0x180800 slurmd: cred/munge: init: Munge credential signature plugin loaded slurmd: debug3: Success.
This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8
Thank you,
Jesse
# slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=prod-cluster SlurmctldHost=controller # #MailProg=/bin/mail #MpiDefault= #MpiParams=ports=#-# ProctrackType=proctrack/cgroup ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld #SwitchType= TaskPlugin=task/affinity,task/cgroup # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres # # # LOGGING AND ACCOUNTING #AccountingStorageType= #JobAcctGatherFrequency=30 #JobAcctGatherType= #SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES NodeName=controller CPUs=1 State=UNKNOWN NodeName=node CPUs=1 State=UNKNOWN PartitionName=prod-part Nodes=ALL Default=YES MaxTime=INFINITE State=UP
On Jan 23, 2024, at 18:14, Jesse Aiton <jesse@clarkeconsulting.com> wrote: This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8 Thank you, Jesse I’m not sure what version you’re actually running, but I don’t believe there is a 24.0.8. The latest version I’m aware of is 23.11.2. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'
Yeah, 24.0.8 is the bleeding edge version. I wanted to try the latest in case it was a bug in 20.x.x. I’m happy to go back to any older Slurm version but I don’t think that will matter much if the issue occurs on both Slurm 20 and Slurm 24. git clone https://github.com/SchedMD/slurm.git Thanks, Jesse
On Jan 23, 2024, at 4:07 PM, Ryan Novosielski <novosirj@rutgers.edu> wrote:
On Jan 23, 2024, at 18:14, Jesse Aiton <jesse@clarkeconsulting.com> wrote:
This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8
Thank you,
Jesse
I’m not sure what version you’re actually running, but I don’t believe there is a 24.0.8. The latest version I’m aware of is 23.11.2.
-- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'
Ah, I see — no, it’s 24.08. That’s why I didn’t find any reference to it. Carry on! :-D -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Jan 23, 2024, at 19:13, Jesse Aiton <jesse@clarkeconsulting.com> wrote: Yeah, 24.0.8 is the bleeding edge version. I wanted to try the latest in case it was a bug in 20.x.x. I’m happy to go back to any older Slurm version but I don’t think that will matter much if the issue occurs on both Slurm 20 and Slurm 24. git clone https://github.com/SchedMD/slurm.git Thanks, Jesse On Jan 23, 2024, at 4:07 PM, Ryan Novosielski <novosirj@rutgers.edu> wrote: On Jan 23, 2024, at 18:14, Jesse Aiton <jesse@clarkeconsulting.com> wrote: This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8 Thank you, Jesse I’m not sure what version you’re actually running, but I don’t believe there is a 24.0.8. The latest version I’m aware of is 23.11.2. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'
participants (3)
-
Jesse Aiton -
Ryan Novosielski -
Sean Crosby