<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
On the worker node, check if cgroups are mounted</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
grep cgroup /proc/mounts </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
(normally it's in /sys/fs/cgroup )</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
then check if Slurm is setting up the cgroup</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
find /sys/fs/cgroup | grep slurm</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
e.g.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<blockquote style="margin-top:0;margin-bottom:0">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
[root@spartan-gpgpu164 ~]# find /sys/fs/cgroup/memory | grep slurm
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.tcp.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.tcp.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.tcp.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.tcp.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.memsw.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.memsw.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.memsw.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.memsw.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.slabinfo</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.kmem.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.numa_stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.pressure_level</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.oom_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.move_charge_at_immigrate</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.swappiness</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.use_hierarchy</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.force_empty</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.soft_limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/memory.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/cgroup.clone_children</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/cgroup.event_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/notify_on_release</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/cgroup.procs</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_0/tasks</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.tcp.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.tcp.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.tcp.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.tcp.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.memsw.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.memsw.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.memsw.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.memsw.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.slabinfo</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.kmem.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.numa_stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.pressure_level</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.oom_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.move_charge_at_immigrate</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.swappiness</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.use_hierarchy</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.force_empty</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.soft_limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/memory.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/cgroup.clone_children</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/cgroup.event_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/notify_on_release</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/cgroup.procs</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_batch/tasks</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.tcp.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.tcp.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.tcp.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.tcp.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.memsw.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.memsw.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.memsw.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.memsw.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.slabinfo</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.kmem.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.numa_stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.pressure_level</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.oom_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.move_charge_at_immigrate</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.swappiness</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.use_hierarchy</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.force_empty</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.soft_limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/memory.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/cgroup.clone_children</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/cgroup.event_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/notify_on_release</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/cgroup.procs</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/step_extern/tasks</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.tcp.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.tcp.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.tcp.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.tcp.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.memsw.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.memsw.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.memsw.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.memsw.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.slabinfo</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.kmem.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.numa_stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.pressure_level</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.oom_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.move_charge_at_immigrate</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.swappiness</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.use_hierarchy</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.force_empty</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.stat</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.failcnt</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.soft_limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.limit_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.max_usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/memory.usage_in_bytes</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/cgroup.clone_children</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/cgroup.event_control</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/notify_on_release</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/cgroup.procs</div>
<div class="ContentPasted0">/sys/fs/cgroup/memory/slurm/uid_14633/job_48210004/tasks</div>
</div>
</blockquote>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
Note that this will work for cgroups V1. If you are using later OS's (e.g. Ubuntu 22.04), you have to use cgroups V2</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<br>
Sean</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Boris Yazlovitsky <borisyaz@gmail.com><br>
<b>Sent:</b> Friday, 23 June 2023 12:49<br>
<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> Re: [slurm-users] [EXT] --mem is not limiting the job's memory</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div class="x_gmail_default" style="font-family:courier new,monospace">
<div class="x_mc-ip-hide">
<div style="color:#000000; font-size:12px; text-align:left; font-family:Helvetica,Arial,sans-serif">
<strong>
<table border="0" cellspacing="0" cellpadding="0" style="width:100%; float:left">
<tbody>
<tr>
<td style="color:red"><b>External email: </b>Please exercise caution</td>
</tr>
</tbody>
</table>
</strong><br>
</div>
<hr>
</div>
it's still not constraining memory...</div>
<div class="x_gmail_default" style="font-family:courier new,monospace"><br>
</div>
<div class="x_gmail_default" style="font-family:courier new,monospace">a memhog job continues to memhog:</div>
<div class="x_gmail_default" style="font-family:courier new,monospace"><br>
</div>
<div class="x_gmail_default" style="font-family:courier new,monospace">boris@rod:~/scripts$ sacct --starttime=2023-05-01 --format=jobid,user,start,elapsed,reqmem,maxrss,maxvmsize,nodelist,state,exit -j 199<br>
JobID User Start Elapsed ReqMem MaxRSS MaxVMSize NodeList State ExitCode
<br>
------------ --------- ------------------- ---------- ---------- ---------- ---------- --------------- ---------- --------
<br>
199 boris 2023-06-23T02:42:30 00:01:21 1M milhouse COMPLETED 0:0
<br>
199.batch 2023-06-23T02:42:30 00:01:21 104857988K 104858064K milhouse COMPLETED 0:0 <br>
</div>
<div class="x_gmail_default" style="font-family:courier new,monospace"><br>
</div>
<div class="x_gmail_default" style="font-family:courier new,monospace">One thing I noticed is that the machines I'm working on do not have libcgroup and libcgroup-dev installed - but slurm does have its own cgroup implementation? the slurmd processes do utilize
/usr/lib/slurm/*cgroup.so objects. I will try to recompile slurm with those cgrouplib packages present.</div>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Thu, Jun 22, 2023 at 6:04 PM Ozeryan, Vladimir <<a href="mailto:Vladimir.Ozeryan@jhuapl.edu">Vladimir.Ozeryan@jhuapl.edu</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div class="x_msg-9217506820023448158">
<div lang="EN-US">
<div class="x_m_-9217506820023448158WordSection1">
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">No worries,<u></u><u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">No, we don’t have any OS level settings, only “allowed_devices.conf” which just has /dev/random, /dev/tty and stuff like that.<u></u><u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">But I think this could be the culprit, check out man page for cgroup.conf<br>
</span><span style="font-family:"Courier New"">AllowedRAMSpace=100<u></u><u></u></span></p>
<p class="x_MsoNormal"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">I would just leave these four:<u></u><u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">CgroupAutomount=yes<br>
ConstrainCores=yes<br>
ConstrainDevices=yes<br>
ConstrainRAMSpace=yes<u></u><u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">Vlad.<u></u><u></u></span></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="x_MsoNormal"><b><span style="font-size:11pt; font-family:Calibri,sans-serif">From:</span></b><span style="font-size:11pt; font-family:Calibri,sans-serif"> slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>>
<b>On Behalf Of </b>Boris Yazlovitsky<br>
<b>Sent:</b> Thursday, June 22, 2023 5:40 PM<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [slurm-users] [EXT] --mem is not limiting the job's memory<u></u><u></u></span></p>
<p class="x_MsoNormal"><u></u> <u></u></p>
<div>
<div id="x_m_-9217506820023448158APLWarningText">
<table border="0" cellspacing="0" cellpadding="0" align="left">
<tbody>
<tr>
<td width="100%" style="width:100%; background:rgb(224,224,224); padding:0in">
<p class="x_MsoNormal"><b><span style="color:red">APL external email warning: </span>
</b><span style="color:black">Verify sender <a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">
slurm-users-bounces@lists.schedmd.com</a> before clicking links or attachments</span><u></u><u></u></p>
</td>
</tr>
</tbody>
</table>
<p> <u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">thank you Vlad - looks like we have the same yes's<u></u><u></u></span></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">Do you remember if you had to make any settings on the OS level or in the kernel to make it work?<u></u><u></u></span></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">-b<u></u><u></u></span></p>
</div>
</div>
<p class="x_MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="x_MsoNormal">On Thu, Jun 22, 2023 at 5:31 PM Ozeryan, Vladimir <<a href="mailto:Vladimir.Ozeryan@jhuapl.edu" target="_blank">Vladimir.Ozeryan@jhuapl.edu</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none; border-right:none; border-bottom:none; border-left:1pt solid rgb(204,204,204); padding:0in 0in 0in 6pt; margin-left:4.8pt; margin-right:0in">
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">Hello,</span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">We have the following configured and it seems to be working ok.</span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="x_MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">CgroupAutomount=yes<br>
ConstrainCores=yes<br>
ConstrainDevices=yes<br>
ConstrainRAMSpace=yes</span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">Vlad.</span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="x_MsoNormal"><b><span style="font-size:11pt; font-family:Calibri,sans-serif">From:</span></b><span style="font-size:11pt; font-family:Calibri,sans-serif"> slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>>
<b>On Behalf Of </b>Boris Yazlovitsky<br>
<b>Sent:</b> Thursday, June 22, 2023 4:50 PM<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [slurm-users] [EXT] --mem is not limiting the job's memory</span><u></u><u></u></p>
<p class="x_MsoNormal"> <u></u><u></u></p>
<div>
<div id="x_m_-9217506820023448158m_-3473428741860337749APLWarningText">
<table border="0" cellspacing="0" cellpadding="0" align="left">
<tbody>
<tr>
<td width="100%" style="width:100%; background:rgb(224,224,224); padding:0in">
<p class="x_MsoNormal"><b><span style="color:red">APL external email warning: </span>
</b><span style="color:black">Verify sender <a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">
slurm-users-bounces@lists.schedmd.com</a> before clicking links or attachments</span><u></u><u></u></p>
</td>
</tr>
</tbody>
</table>
<p> <u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">Hello Vladimir, thank you for your response.</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">this is the cgroups.conf file:</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">CgroupAutomount=yes<br>
ConstrainCores=yes<br>
ConstrainDevices=yes<br>
ConstrainRAMSpace=yes<br>
ConstrainSwapSpace=yes<br>
MaxRAMPercent=90<br>
AllowedSwapSpace=0<br>
AllowedRAMSpace=100<br>
MemorySwappiness=0<br>
MaxSwapPercent=0</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">/etc/default/grub:</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">GRUB_DEFAULT=0<br>
GRUB_TIMEOUT_STYLE=hidden<br>
GRUB_TIMEOUT=0<br>
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`<br>
GRUB_CMDLINE_LINUX_DEFAULT=""<br>
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cgroup_enable=memory swapaccount=1"</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">what other cgroup settings need to be set?</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">&& thank you!</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">-b</span><u></u><u></u></p>
</div>
</div>
<p class="x_MsoNormal"> <u></u><u></u></p>
<div>
<div>
<p class="x_MsoNormal">On Thu, Jun 22, 2023 at 4:02 PM Ozeryan, Vladimir <<a href="mailto:Vladimir.Ozeryan@jhuapl.edu" target="_blank">Vladimir.Ozeryan@jhuapl.edu</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none; border-right:none; border-bottom:none; border-left:1pt solid rgb(204,204,204); padding:0in 0in 0in 6pt; margin:5pt 0in 5pt 4.8pt">
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">--mem=5G. Should allocate 5G of memory per node.</span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)">Are your cgroups configured?</span><u></u><u></u></p>
<p class="x_MsoNormal"><span style="font-size:11pt; font-family:Calibri,sans-serif; color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="x_MsoNormal"><b><span style="font-size:11pt; font-family:Calibri,sans-serif">From:</span></b><span style="font-size:11pt; font-family:Calibri,sans-serif"> slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>>
<b>On Behalf Of </b>Boris Yazlovitsky<br>
<b>Sent:</b> Thursday, June 22, 2023 3:28 PM<br>
<b>To:</b> <a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a><br>
<b>Subject:</b> [EXT] [slurm-users] --mem is not limiting the job's memory</span><u></u><u></u></p>
<p class="x_MsoNormal"> <u></u><u></u></p>
<div>
<div id="x_m_-9217506820023448158m_-3473428741860337749m_-2593608553680693805APLWarningText">
<table border="0" cellspacing="0" cellpadding="0" align="left">
<tbody>
<tr>
<td width="100%" style="width:100%; background:rgb(224,224,224); padding:0in">
<p class="x_MsoNormal"><b><span style="color:red">APL external email warning: </span>
</b><span style="color:black">Verify sender <a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">
slurm-users-bounces@lists.schedmd.com</a> before clicking links or attachments</span><u></u><u></u></p>
</td>
</tr>
</tbody>
</table>
<p> <u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">Running slurm 22.03.02 on Ubunutu 22.04 server.</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">Jobs submitted with --mem=5g are able to allocate an unlimited amount of memory.</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">how to limit on the job submission level how much memory it can grab?</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New"">thanks, and best regards!<br>
Boris</span><u></u><u></u></p>
</div>
<div>
<p class="x_MsoNormal"><span style="font-family:"Courier New""> </span><u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>