<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
When I see odd behaviour I've found it sometimes related to either NTP issues (the time is off) or munge errors:</div>
<div style="font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
<ul>
<li><span>Is NTP running and is the time accurate</span></li><li><span>Look for munge errors:</span></li><ul style="list-style-type: circle;">
<li>/var/log/munge/munged.log</li><li>sudo systemctl status munge<br>
</li></ul>
</ul>
<div>If it's a munge error, usually restarting munge does the trick:<br>
<br>
</div>
<div>sudo systemctl restart munge<br>
</div>
<div><br>
</div>
<div>Regards</div>
<div>--Mick</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Alan Orth <alan.orth@gmail.com><br>
<b>Sent:</b> Tuesday, August 16, 2022 4:36 PM<br>
<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> Re: [slurm-users] Problems with cgroupsv2</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>I re-installed SLURM 22.05.3 and then restarted slurmd and now it's working:</div>
<div><br>
</div>
<div># dnf reinstall slurm slurm-slurmd slurm-devel slurm-pam_slurm <br>
</div>
<div># systemctl restart slurmd</div>
<div><br>
</div>
<div>The dnf.log shows that the versions were the same, so there was no mismatch or anything:<br>
</div>
<div><br>
</div>
<div>2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-22.05.3-1.el8.x86_64<br>
2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-devel-22.05.3-1.el8.x86_64<br>
2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-pam_slurm-22.05.3-1.el8.x86_64<br>
2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-slurmd-22.05.3-1.el8.x86_64</div>
<div><br>
</div>
<div>So I'm not sure what's going on... anyways, at least it's working now!</div>
<div><br>
</div>
<div>Regards,<br>
</div>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Tue, Aug 16, 2022 at 12:53 PM Alan Orth <<a href="mailto:alan.orth@gmail.com">alan.orth@gmail.com</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div>Dear list,</div>
<div><br>
</div>
<div>I've been using cgroupsv2 with SLURM 22.05 on CentOS Stream 8 successfully for a few months now. Recently a few of my nodes have started having problems starting slurmd. The log shows:</div>
<div><br>
</div>
<div>[2022-08-16T20:52:58.439] slurmd version 22.05.3 started<br>
[2022-08-16T20:52:58.439] error: Controller cpuset is not enabled!<br>
[2022-08-16T20:52:58.439] error: Controller cpu is not enabled!<br>
[2022-08-16T20:52:58.439] error: cpu cgroup controller is not available.<br>
[2022-08-16T20:52:58.439] error: There's an issue initializing memory or cpu controller<br>
[2022-08-16T20:52:58.439] error: Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin init() callback failed<br>
[2022-08-16T20:52:58.439] error: cannot create jobacct_gather context for jobacct_gather/cgroup<br>
[2022-08-16T20:52:58.439] fatal: Unable to initialize jobacct_gather</div>
<div><br>
</div>
<div>The system has cgroupsv2 enabled as far as I can tell:</div>
<div><br>
</div>
<div># cat /sys/fs/cgroup/cgroup.controllers<br>
cpuset cpu io memory hugetlb pids rdma<br>
# [ $(stat -fc %T /sys/fs/cgroup/) = "cgroup2fs" ] && echo "unified" || ( [ -e /sys/fs/cgroup/unified/ ] && echo "hybrid" || echo "legacy")<br>
unified</div>
<div><br>
</div>
<div>And my slurm.conf has:</div>
<div><br>
</div>
<div>ProctrackType=proctrack/cgroup</div>
<div>TaskPlugin=task/affinity,task/cgroup</div>
<div><br>
</div>
<div>And cgroup.conf:</div>
<div><br>
</div>
<div>CgroupAutomount=yes<br>
CgroupPlugin=autodetect</div>
<div><br>
</div>
<div>What else should I look for before giving up and reverting to cgroupsv1? My current version is 22.05.3, but it was happening in 22.05.2 as well.<br>
</div>
<div><br>
</div>
<div>Thank you for any advice.<br>
</div>
<div>-- <br>
<div dir="ltr">
<div dir="ltr">
<div>Alan Orth<br>
<a href="mailto:alan.orth@gmail.com" target="_blank">alan.orth@gmail.com</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=" target="_blank">https://picturingjordan.com</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=" target="_blank">https://englishbulgaria.net</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=" target="_blank">https://mjanja.ch</a></div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="x_gmail_signature">
<div dir="ltr">
<div>Alan Orth<br>
<a href="mailto:alan.orth@gmail.com" target="_blank">alan.orth@gmail.com</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=" target="_blank">https://picturingjordan.com</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=" target="_blank">https://englishbulgaria.net</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=" target="_blank">https://mjanja.ch</a></div>
</div>
</div>
</div>
</body>
</html>