[slurm-users] task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04
abel pinto
abelpc_uff at yahoo.com.br
Fri Jun 16 01:28:26 UTC 2023
Indeed, the issue seems to be that Ubuntu 22.04 does not support cgroups v1 anymore. Does SLURM support cgroupsv2? It seems so: https://slurm.schedmd.com/cgroup_v2.html
/Abel
> On Jun 15, 2023, at 20:20, Reed Dier <reed.dier at focusvq.com> wrote:
>
> I don’t have any direct advice off-hand, but I figure I will try to help steer the conversation in the right direction for figuring it out.
>
> I’m going to assume that since you mention 21.08.5, that this means you are using the slurm-wlm packages from the ubuntu repos, and not building yourself?
>
> And have all the components (slurmctld(s), slurmdbd, slurmd(s)) been upgraded as well?
>
> The only thing that immediately comes to mind is that I remember reading a good bit about Ubuntu 22.04’s use of cgroups v2, which as I understand it are very different from cgroups v1, and plenty of people have had issues with v1/v2 mismatches with slurm and other applications.
>
> https://www.reddit.com/r/SLURM/comments/vjquih/error_cannot_find_cgroup_plugin_for_cgroupv2/
> https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1
> https://discuss.linuxcontainers.org/t/after-updated-to-more-recent-ubuntu-version-with-cgroups-v2-ubuntu-16-04-container-is-not-working-properly/14022
>
> Hope that at least steers the conversation in a good direction.
>
> Reed
>
>> On Jun 15, 2023, at 5:04 PM, Tim Schneider <tim.schneider1 at tu-darmstadt.de> wrote:
>>
>> Hi,
>> I am maintaining the SLURM cluster of my research group. Recently I updated to Ubuntu 22.04 and Slurm 21.08.5 and ever since, I am unable to launch jobs. When launching a job, I receive the following error:
>>
>> $ srun --nodes=1 --ntasks-per-node=1 -c 1 --mem-per-cpu 1G --time=01:00:00 --pty -p amd -w cn02 --pty bash -i
>> srun: error: task 0 launch failed: Plugin initialization failed
>>
>> Strangely, I cannot find any indication of this problem in the logs (find the logs attached). The problem must be related to the task/cgroup plugin, as it does not occur when I disable it.
>>
>> After reading in the documentation, I tried adding the cgroup_enable=memory swapaccount=1 kernel parameters, but the problem persisted.
>>
>> I would be very grateful for any advice where to look since I have no idea how to investigate this issue further.
>>
>> Thanks a lot in advance.
>>
>> Best,
>>
>> Tim
>>
>>
>>
>> <cgroup.conf><slurmd.log><slurmctld.log>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230615/01fddbeb/attachment-0001.htm>
More information about the slurm-users
mailing list