There needs to be a slurmstepd infinity process running before slurmd starts. This doc goes into it: https://slurm.schedmd.com/cgroup_v2.html
Probably a better way to do this, but this is what we do to deal with that:
:::::::::::::: files/slurm-cgrepair.service :::::::::::::: [Unit] Before=slurmd.service slurmctld.service After=nas-longleaf.mount remote-fs.target system.slice
[Service] Type=oneshot ExecStart=/callback/slurm-cgrepair.sh
[Install] WantedBy=default.target :::::::::::::: files/slurm-cgrepair.sh :::::::::::::: #!/bin/bash /usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/cgroup.subtree_control && \ /usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/system.slice/cgroup.subtree_control
/usr/sbin/slurmstepd infinity &
From: Josef Dvoracek via slurm-users slurm-users@lists.schedmd.com Sent: Thursday, April 11, 2024 11:14 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Slurmd enabled crash with CgroupV2
I observe same behavior on slurm 23.11.5 Rocky Linux8.9..
[root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control memory pids [root@compute ~]# systemctl disable slurmd Removed /etc/systemd/system/multi-user.target.wants/slurmd.service. [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control cpuset cpu io memory pids [root@compute ~]# systemctl enable slurmd Created symlink /etc/systemd/system/multi-user.target.wants/slurmd.service → /usr/lib/systemd/system/slurmd.service. [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control cpuset cpu io memory pids
over time (i see this thread is ~1 year old, is here better / new understanding of this?
cheers
josef
On 23. 05. 23 12:46, Alan Orth wrote: I notice the exact same behavior as Tristan. My CentOS Stream 8 system is in full unified cgroupv2 mode, the slurmd.service has a "Delegate=Yes" override added to it, and all cgroup stuff is added to slurm.conf and cgroup.conf, yet slurmd does not start after reboot. I don't understand what is happening, but I see the exact same behavior regarding the cgroup subtree_control with disabling / re-enabling slurmd.