[slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

Angel de Vicente angel.de.vicente at iac.es
Thu Sep 7 18:09:23 UTC 2023


Hello Cristobal,

Cristóbal Navarro <cristobal.navarro.g at gmail.com> writes:

> Hello Angel and Community,

> I am facing a similar problem with a DGX A100 with DGX OS 6 (Based on
> Ubuntu 22.04 LTS) and Slurm 23.02.
> When I execute `slurmd` service, it status shows failed with the
> following information below.
> As of today, what is the best solution to this problem? I am really
> not sure if the DGX A100 could fail or not by disabling cgroups v1.
> Any suggestions are welcome.

did you manage to find a solution to this without disabling cgroups v1?

In our case:

,----
| slurm 23.02.3
| Ubuntu 22.04.3 LTS
| 
| # cat /proc/cmdline 
| BOOT_IMAGE=/boot/vmlinuz-5.15.0-83-generic root=UUID=... ro quiet splash cgroup_no_v1=all vt.handoff=7
`----

disabling cgroups v1 has been working reliably, but it would be nice to
find a solution that doesn't require modifying the kernel parameters.

Cheers,
-- 
Ángel de Vicente
 Research Software Engineer (Supercomputing and BigData)
 Tel.: +34 922-605-747
 Web.: http://research.iac.es/proyecto/polmag/

 GPG: 0x8BDC390B69033F52
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5877 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230907/a714e5f5/attachment.bin>


More information about the slurm-users mailing list