[slurm-users] Slurmd enabled crash with CgroupV2

Fri Mar 10 19:08:27 UTC 2023

I'm not sure which specific item to look at, but this seems like a race 
condition.
Likely you need to add an override to your slurmd startup 
(/etc/systemd/system/slurmd.service/override.conf) and put a dependency 
there so it won't start until that is done.

I have mine wait for a few things:

[Unit]
After=autofs.service getty.target sssd.service

That makes it wait for all of those before trying to start.

Brian Andrus

On 3/10/2023 7:41 AM, Tristan LEFEBVRE wrote:
>
> Hello to all,
>
> I'm trying to do an installation of Slurm with cgroupv2 activated.
>
> But I'm facing an odd thing : when slurmd is enabled it crash at the 
> next reboot and will never start unless i disable it.
>
> Here is a full example of the situation
>
>
> [root at compute ~]# systemctl start slurmd [root at compute ~]# systemctl 
> status slurmd ● slurmd.service - Slurm node daemon Loaded: loaded 
> (/usr/lib/systemd/system/slurmd.service; disabled; vendor preset: 
> disabled) Active: active (running) since Fri 2023-03-10 15:57:00 CET; 
> 967ms ago Main PID: 8053 (slurmd) Tasks: 1 Memory: 3.1M CGroup: 
> /system.slice/slurmd.service └─8053 /opt/slurm_bin/sbin/slurmd -D 
> --conf-server XXXXX:6817 -s mars 10 15:57:00 compute.cluster.lab 
> systemd[1]: Started Slurm node daemon. mars 10 15:57:00 
> compute.cluster.lab slurmd[8053]: slurmd: slurmd version 23.02.0 
> started mars 10 15:57:00 compute.cluster.lab slurmd[8053]: slurmd: 
> slurmd started on Fri, 10 Mar 2023 15:57:00 +0100 mars 10 15:57:00 
> compute.cluster.lab slurmd[8053]: slurmd: CPUs=48 Boards=1 Sockets=2 
> Cores=24 Threads=1 Memory=385311 TmpDisk=19990 Uptime=12> 
> [root at compute ~]# systemctl enable slurmd Created symlink 
> /etc/systemd/system/multi-user.target.wants/slurmd.service → 
> /usr/lib/systemd/system/slurmd.service.
> [root at compute ~]# reboot now
>
> > [ reboot of the node]
>
> [adm at compute ~]$ sudo systemctl status slurmd ● slurmd.service - Slurm 
> node daemon Loaded: loaded (/usr/lib/systemd/system/slurmd.service; 
> enabled; vendor preset: disabled) Active: failed (Result: exit-code) 
> since Fri 2023-03-10 16:00:33 CET; 1min 0s ago Process: 2659 
> ExecStart=/opt/slurm_bin/sbin/slurmd -D --conf-server XXXX:6817 -s 
> $SLURMD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 2659 
> (code=exited, status=1/FAILURE) mars 10 16:00:33 compute.cluster.lab 
> slurmd[2659]: slurmd: slurmd version 23.02.0 started mars 10 16:00:33 
> compute.cluster.lab slurmd[2659]: slurmd: error: Controller cpuset is 
> not enabled! mars 10 16:00:33 compute.cluster.lab slurmd[2659]: 
> slurmd: error: Controller cpu is not enabled! mars 10 16:00:33 
> compute.cluster.lab slurmd[2659]: slurmd: error: cpu cgroup controller 
> is not available. mars 10 16:00:33 compute.cluster.lab slurmd[2659]: 
> slurmd: error: There's an issue initializing memory or cpu controller 
> mars 10 16:00:33 compute.cluster.lab slurmd[2659]: slurmd: error: 
> Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin 
> init()> mars 10 16:00:33 compute.cluster.lab slurmd[2659]: slurmd: 
> error: cannot create jobacct_gather context for jobacct_gather/cgroup 
> mars 10 16:00:33 compute.cluster.lab slurmd[2659]: slurmd: fatal: 
> Unable to initialize jobacct_gather mars 10 16:00:33 
> compute.cluster.lab systemd[1]: slurmd.service: Main process exited, 
> code=exited, status=1/FAILURE mars 10 16:00:33 compute.cluster.lab 
> systemd[1]: slurmd.service: Failed with result 'exit-code'. 
> [adm at compute ~]$ sudo systemctl start slurmd [adm at compute ~]$ sudo 
> systemctl status slurmd ● slurmd.service - Slurm node daemon Loaded: 
> loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor 
> preset: disabled) Active: failed (Result: exit-code) since Fri 
> 2023-03-10 16:01:37 CET; 1s ago Process: 3321 
> ExecStart=/opt/slurm_bin/sbin/slurmd -D --conf-server XXXX:6817 -s 
> $SLURMD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 3321 
> (code=exited, status=1/FAILURE) mars 10 16:01:37 compute.cluster.lab 
> slurmd[3321]: slurmd: slurmd version 23.02.0 started mars 10 16:01:37 
> compute.cluster.lab slurmd[3321]: slurmd: error: Controller cpuset is 
> not enabled! mars 10 16:01:37 compute.cluster.lab slurmd[3321]: 
> slurmd: error: Controller cpu is not enabled! mars 10 16:01:37 
> compute.cluster.lab slurmd[3321]: slurmd: error: cpu cgroup controller 
> is not available. mars 10 16:01:37 compute.cluster.lab slurmd[3321]: 
> slurmd: error: There's an issue initializing memory or cpu controller 
> mars 10 16:01:37 compute.cluster.lab slurmd[3321]: slurmd: error: 
> Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin 
> init()> mars 10 16:01:37 compute.cluster.lab slurmd[3321]: slurmd: 
> error: cannot create jobacct_gather context for jobacct_gather/cgroup 
> mars 10 16:01:37 compute.cluster.lab slurmd[3321]: slurmd: fatal: 
> Unable to initialize jobacct_gather mars 10 16:01:37 
> compute.cluster.lab systemd[1]: slurmd.service: Main process exited, 
> code=exited, status=1/FAILURE mars 10 16:01:37 compute.cluster.lab 
> systemd[1]: slurmd.service: Failed with result 'exit-code'. 
> [adm at compute ~]$ sudo systemctl disable slurmd Removed 
> /etc/systemd/system/multi-user.target.wants/slurmd.service. 
> [adm at compute ~]$ sudo systemctl start slurmd [adm at compute ~]$ sudo 
> systemctl status slurmd ● slurmd.service - Slurm node daemon Loaded: 
> loaded (/usr/lib/systemd/system/slurmd.service; disabled; vendor 
> preset: disabled) Active: active (running) since Fri 2023-03-10 
> 16:01:45 CET; 1s ago Main PID: 3358 (slurmd) Tasks: 1 Memory: 6.1M 
> CGroup: /system.slice/slurmd.service └─3358 /opt/slurm_bin/sbin/slurmd 
> -D --conf-server XXXX:6817 -s mars 10 16:01:45 compute.cluster.lab 
> systemd[1]: Started Slurm node daemon. mars 10 16:01:45 
> compute.cluster.lab slurmd[3358]: slurmd: slurmd version 23.02.0 
> started mars 10 16:01:45 compute.cluster.lab slurmd[3358]: slurmd: 
> slurmd started on Fri, 10 Mar 2023 16:01:45 +0100 mars 10 16:01:45 
> compute.cluster.lab slurmd[3358]: slurmd: CPUs=48 Boards=1 Sockets=2 
> Cores=24 Threads=1 Memory=385311 TmpDisk=19990 Uptime=84>
>
> As you can see. Slurmd successfully start only when not enable after a 
> reboot.
>
> - I'm using Rocky Linux 8  and I've configured cgroupv2 with grubby
>
> > grubby --update-kernel=ALL 
> --args="systemd.unified_cgroup_hierarchy=1 
> systemd.legacy_systemd_cgroup_controller=0 cgroup_no_v1=all"
>
> - Slurm 23.02 is build with rpmbuild and slurmd on the compute node is 
> installed with rpm
>
> - Here is my cgroup.conf :
>
> CgroupPlugin=cgroup/v2 ConstrainCores=yes ConstrainRAMSpace=yes 
> ConstrainSwapSpace=yes ConstrainDevices=no
>
> And my slurm.conf have :
>
> ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup,task/affinity 
> JobAcctGatherType=jobacct_gather/cgroup
>
>
> - If i do "systemctl start slurmd" on a compute node it's a success.
>
> - If i do "systemctl enable slurmd" and then "systemctl restart 
> slurmd" it's still ok
>
> - if i enable and reboot, slurmd send this error :
>
> slurmd: error: Controller cpuset is not enabled! slurmd: error: 
> Controller cpu is not enabled! slurmd: error: cpu cgroup controller is 
> not available. slurmd: error: There's an issue initializing memory or 
> cpu controlle
>
> -  I've done some research and read about cgroup.subtree_control. And 
> so if i do:
>
> cat /sys/fs/cgroup/cgroup.subtree_control memory pids
>
> So I've tried to follow the RedHat documentation with there example : 
> ( the link of the RedHat page here 
> <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/using-cgroups-v2-to-control-distribution-of-cpu-time-for-applications_managing-monitoring-and-updating-the-kernel>)
>
> |echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control echo "+cpuset" 
> >> /sys/fs/cgroup/cgroup.subtree_control cat 
> /sys/fs/cgroup/cgroup.subtree_control cpuset cpu memory pids |
>
> And indeed i can restart slurmd.
>
> But at the next boot it failed again  and 
> /sys/fs/cgroup/cgroup.subtree_control is back with "memory pids" only.
>
> And strangely i found if slurmd is enabled and then i disable it, it 
> change the value of /sys/fs/cgroup/cgroup.subtree_control :
>
> [root at compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control memory 
> pids [root at compute ~]# systemctl disable slurmd Removed 
> /etc/systemd/system/multi-user.target.wants/slurmd.service. 
> [root at compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control cpuset cpu 
> io memory pids
>
>
> I've made a script at launch time as a dirty fix by using ExecPreStart 
> in slurmd.service:
>
> ExecStartPre=/opt/slurm_bin/dirty_fix_slurmd.sh
>
> with dirty_fix_slurmd.sh:
>
> #!/bin//bash
> echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control
> echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control
> echo "+cpu" >> /sys/fs/cgroup/system.slice/cgroup.subtree_control
> echo "+cpuset" >> /sys/fs/cgroup/system.slice/cgroup.subtree_control
>
> (And i'm not sure if this is something good to do ?)
>
>
> If you have an idea how to correct this situation
>
> Have a nice day
>
> Thank you
>
> Tristan LEFEBVRE
>
> CONFIDENTIALITE : ce courriel et les éventuelles pièces attachées sont 
> la propriété de l’IRT Jules Verne, sont confidentiels et sont réservés 
> à l’usage de la ou des personne(s) identifées(s) comme 
> destinataire(s). Si vous avez reçu ce courriel par erreur, toute 
> utilisation, divulgation, ou copie de ce courriel est interdite. Dans 
> ce cas, merci d’en informer immédiatement l'expéditeur et de supprimer 
> le courriel et ses pièces jointes.
> CONFIDENTIALITY : This e-mail and any attachments are IRT Jules 
> Verne’s property and are intended solely for the person or entity to 
> whom it is addressed, and may contain confidential or privileged 
> information. Should you have received this e-mail in error, any use, 
> disclosure, or copy of this email is prohibited. In this case, please 
> inform the sender immediately and delete this email and its attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230310/4083f4ff/attachment-0001.htm>