Hi all,
When trying to start slurmd, it is failing with cgroup issues. Any suggestions on where to troubleshoot this issue?
x8000c0s0b0n0:~ # slurmd -V slurm 24.11.0
x8000c0s0b0n0:~ # slurmd -D -vvv slurmd: debug: Log file re-opened slurmd: debug2: hwloc_topology_init slurmd: debug2: hwloc_topology_load slurmd: debug2: hwloc_topology_export_xml slurmd: debug: CPUs:288 Boards:1 Sockets:4 CoresPerSocket:72 ThreadsPerCore:1 slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files slurmd: error: cannot find cgroup plugin for cgroup/v2 slurmd: error: cannot create cgroup context for cgroup/v2 slurmd: error: Unable to initialize cgroup plugin slurmd: error: slurmd initialization failed
x8000c0s0b0n0:~ # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
x8000c0s0b0n0:~ # grep cgroup /etc/slurm/slurm.conf ProctrackType=proctrack/cgroup TaskPlugin=task/affinity,task/cgroup
Thanks! Jordan
On further inspection I found:
slurmd: debug3: Trying to load plugin /usr/lib64/slurm/cgroup_v2.so slurmd: debug4: /usr/lib64/slurm/cgroup_v2.so: Does not exist or not a regular file.
Which didn’t exist. So I created the cgroup.conf file:
x8000c0s0b0n0:/etc/slurm # cat cgroup.conf CgroupPlugin=cgroup/v1 ConstrainCores=yes ConstrainRAMSpace=yes AllowedRAMSpace=95
then
mkdir -p /sys/fs/cgroup/freezer mount -t cgroup -o freezer cgroup /sys/fs/cgroup/freezer
now slurmd can start.
- Jordan
From: Webb, Jordan via slurm-users slurm-users@lists.schedmd.com Date: Monday, January 20, 2025 at 3:59 PM To: slurm-users@schedmd.com slurm-users@schedmd.com Subject: [EXTERNAL] [slurm-users] Slurmd cannot find cgroup plugin Hi all, When trying to start slurmd, it is failing with cgroup issues. Any suggestions on where to troubleshoot this issue? x8000c0s0b0n0: ~ # slurmd -V slurm 24. 11. 0 x8000c0s0b0n0: ~ # slurmd -D -vvv slurmd: debug: Log file re-opened slurmd:
Hi all,
When trying to start slurmd, it is failing with cgroup issues. Any suggestions on where to troubleshoot this issue?
x8000c0s0b0n0:~ # slurmd -V slurm 24.11.0
x8000c0s0b0n0:~ # slurmd -D -vvv slurmd: debug: Log file re-opened slurmd: debug2: hwloc_topology_init slurmd: debug2: hwloc_topology_load slurmd: debug2: hwloc_topology_export_xml slurmd: debug: CPUs:288 Boards:1 Sockets:4 CoresPerSocket:72 ThreadsPerCore:1 slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files slurmd: error: cannot find cgroup plugin for cgroup/v2 slurmd: error: cannot create cgroup context for cgroup/v2 slurmd: error: Unable to initialize cgroup plugin slurmd: error: slurmd initialization failed
x8000c0s0b0n0:~ # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
x8000c0s0b0n0:~ # grep cgroup /etc/slurm/slurm.conf ProctrackType=proctrack/cgroup TaskPlugin=task/affinity,task/cgroup
Thanks! Jordan