[slurm-users] Problems with cgroupsv2

Alan Orth alan.orth at gmail.com
Mon Sep 5 06:18:48 UTC 2022


For what it's worth I've rolled back to cgroups v1 on CentOS Stream 8. I
will be watching future SLURM release notes carefully to see if anything
changes here, as well as to see people's experiences here on the list.

Regards,



On Wed, Aug 17, 2022 at 12:36 AM Alan Orth <alan.orth at gmail.com> wrote:

> Thanks for the advice. I checked munge's log on the system that was most
> recently affected and found a few hundred of these:
>
> 2022-08-16 23:30:56 +0300 Info:      Unauthorized credential for client
> UID=0 GID=0
>
> Not sure if relevant. NTP on the system is synced. I'll keep an eye on
> munge in the future...
>
> Thanks again,
>
> On Tue, Aug 16, 2022 at 1:45 PM Timony, Mick <
> Michael_Timony at hms.harvard.edu> wrote:
>
>> When I see odd behaviour I've found it sometimes related to either NTP
>> issues (the time is off) or munge errors:
>>
>>    - Is NTP running and is the time accurate
>>    - Look for munge errors:
>>       - /var/log/munge/munged.log
>>       - sudo systemctl status munge
>>
>> If it's a munge error, usually restarting munge does the trick:
>>
>> sudo systemctl restart munge
>>
>> Regards
>> --Mick
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Alan Orth <alan.orth at gmail.com>
>> *Sent:* Tuesday, August 16, 2022 4:36 PM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* Re: [slurm-users] Problems with cgroupsv2
>>
>> I re-installed SLURM 22.05.3 and then restarted slurmd and now it's
>> working:
>>
>> # dnf reinstall slurm slurm-slurmd slurm-devel slurm-pam_slurm
>> # systemctl restart slurmd
>>
>> The dnf.log shows that the versions were the same, so there was no
>> mismatch or anything:
>>
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-22.05.3-1.el8.x86_64
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled:
>> slurm-devel-22.05.3-1.el8.x86_64
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled:
>> slurm-pam_slurm-22.05.3-1.el8.x86_64
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled:
>> slurm-slurmd-22.05.3-1.el8.x86_64
>>
>> So I'm not sure what's going on... anyways, at least it's working now!
>>
>> Regards,
>>
>> On Tue, Aug 16, 2022 at 12:53 PM Alan Orth <alan.orth at gmail.com> wrote:
>>
>> Dear list,
>>
>> I've been using cgroupsv2 with SLURM 22.05 on CentOS Stream 8
>> successfully for a few months now. Recently a few of my nodes have started
>> having problems starting slurmd. The log shows:
>>
>> [2022-08-16T20:52:58.439] slurmd version 22.05.3 started
>> [2022-08-16T20:52:58.439] error: Controller cpuset is not enabled!
>> [2022-08-16T20:52:58.439] error: Controller cpu is not enabled!
>> [2022-08-16T20:52:58.439] error: cpu cgroup controller is not available.
>> [2022-08-16T20:52:58.439] error: There's an issue initializing memory or
>> cpu controller
>> [2022-08-16T20:52:58.439] error: Couldn't load specified plugin name for
>> jobacct_gather/cgroup: Plugin init() callback failed
>> [2022-08-16T20:52:58.439] error: cannot create jobacct_gather context for
>> jobacct_gather/cgroup
>> [2022-08-16T20:52:58.439] fatal: Unable to initialize jobacct_gather
>>
>> The system has cgroupsv2 enabled as far as I can tell:
>>
>> # cat /sys/fs/cgroup/cgroup.controllers
>> cpuset cpu io memory hugetlb pids rdma
>> # [ $(stat -fc %T /sys/fs/cgroup/) = "cgroup2fs" ] && echo "unified" || (
>> [ -e /sys/fs/cgroup/unified/ ] && echo "hybrid" || echo "legacy")
>> unified
>>
>> And my slurm.conf has:
>>
>> ProctrackType=proctrack/cgroup
>> TaskPlugin=task/affinity,task/cgroup
>>
>> And cgroup.conf:
>>
>> CgroupAutomount=yes
>> CgroupPlugin=autodetect
>>
>> What else should I look for before giving up and reverting to cgroupsv1?
>> My current version is 22.05.3, but it was happening in 22.05.2 as well.
>>
>> Thank you for any advice.
>> --
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=>
>> https://englishbulgaria.net
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=>
>> https://mjanja.ch
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=>
>>
>>
>>
>> --
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=>
>> https://englishbulgaria.net
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=>
>> https://mjanja.ch
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=>
>>
>
>
> --
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
>


-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220905/9390ff6b/attachment.htm>


More information about the slurm-users mailing list