<div dir="ltr">Slurm will allocate more cpus to cover the memory requirement. Use sacct's query fields to compare Requested Resources vs. Allocated Resources:<div><br></div><div><font face="monospace">$ scontrol show part normal_q | grep MaxMem<br>   DefMemPerCPU=1920 MaxMemPerCPU=1920<br><br></font></div><div><font face="monospace">$ srun -n 1 --mem-per-cpu=4000 --partition=normal_q --account=arcadm hostname<br>srun: job 1577313 queued and waiting for resources<br>srun: job 1577313 has been allocated resources<br>tc095</font></div><div><font face="monospace"><br>$ sacct -j 1577313 -o jobid,reqtres%35,alloctres%35<br>       JobID                             ReqTRES                           AllocTRES<br>------------ ----------------------------------- -----------------------------------<br>1577313         billing=1,cpu=1,mem=4000M,node=1    billing=3,cpu=3,mem=4002M,node=1<br>1577313.ext+                                        billing=3,cpu=3,mem=4002M,node=1<br>1577313.0                                                     cpu=3,mem=4002M,node=1</font><br></div><div><font face="monospace"><br></font></div>From the Slurm manuals (eg. <font face="monospace">man srun</font>):<div><font face="monospace"> --mem-per-cpu=<size>[units] </font></div><div><font face="monospace">Minimum  memory required per allocated CPU. ... Note that if the job's --mem-per-cpu value exceeds the configured MaxMemPerCPU, then  the user's  limit  will be treated as a memory limit per task</font></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 24, 2023 at 9:32 AM Groner, Rob <<a href="mailto:rug262@psu.edu">rug262@psu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="msg6568320614354487473">

<div dir="ltr">

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

I'm not sure I can help with the rest, but the EnforcePartLimits setting will only reject a job at submission time that exceeds

<b>partition</b> limits, not overall cluster limits.  I don't see anything, offhand, in the interactive partition definition that is exceeded by your request for 4 GB/CPU.</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Rob</div>

<div>

<div><br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

<hr style="display:inline-block;width:98%">

<b>From:</b> slurm-users on behalf of Angel de Vicente<br>

<b>Sent:</b> Monday, July 24, 2023 7:20 AM<br>

<b>To:</b> Slurm User Community List<br>

<b>Subject:</b> [slurm-users] MaxMemPerCPU not enforced?

<div><br>

</div>

</div>

<div><font size="2"><span style="font-size:11pt">

<div>Hello,<br>

<br>

I'm trying to get Slurm to control the memory used per CPU, but it does<br>

not seem to enforce the MaxMemPerCPU option in slurm.conf<br>

<br>

This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3.<br>

<br>

Relevant configuration options:<br>

<br>

,----cgroup.conf<br>

| AllowedRAMSpace=100<br>

| ConstrainCores=yes<br>

| ConstrainRAMSpace=yes<br>

| ConstrainSwapSpace=yes<br>

| AllowedSwapSpace=0<br>

`----<br>

<br>

,----slurm.conf<br>

| TaskPlugin=task/affinity,task/cgroup<br>

| PrologFlags=X11<br>

| <br>

| SelectType=select/cons_res<br>

| SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK<br>

| MaxMemPerCPU=500<br>

| DefMemPerCPU=200<br>

| <br>

| JobAcctGatherType=jobacct_gather/linux<br>

| <br>

| EnforcePartLimits=ALL<br>

| <br>

| NodeName=xxx RealMemory=257756 Sockets=4 CoresPerSocket=8 ThreadsPerCore=1 Weight=1<br>

| <br>

| PartitionName=batch       Nodes=duna State=UP Default=YES MaxTime=2-00:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:1<br>

| PartitionName=interactive Nodes=duna State=UP Default=NO  MaxTime=08:00:00   MaxCPUsPerNode=32 OverSubscribe=FORCE:2<br>

`----<br>

<br>

<br>

I can ask for an interactive session with 4GB/CPU (I would have thought<br>

that "EnforcePartLimits=ALL" would stop me from doing that), and once<br>

I'm in the interactive session I can execute a 3GB test code without any<br>

issues (I can see with htop that the process does indeed use a RES size<br>

of 3GB at 100% CPU use). Any idea what could be the problem or how to<br>

start debugging this?<br>

<br>

,----<br>

| [angelv@xxx test]$ sinter -n 1 --mem-per-cpu=4000<br>

| salloc: Granted job allocation 127544<br>

| salloc: Nodes xxx are ready for job<br>

| <br>

| (sinter) [angelv@xxx test]$ stress -m 1 -t 600 --vm-keep --vm-bytes 3G<br>

| stress -m 1 -t 600 --vm-keep --vm-bytes 3G<br>

| stress: info: [1772392] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd<br>

`----<br>

<br>

Many thanks,<br>

-- <br>

Ángel de Vicente<br>

 Research Software Engineer (Supercomputing and BigData)<br>

 Tel.: +34 922-605-747<br>

 Web.: <a href="http://research.iac.es/proyecto/polmag/" rel="noopener noreferrer" target="_blank">

http://research.iac.es/proyecto/polmag/</a><br>

<br>

 GPG: 0x8BDC390B69033F52<br>

</div>

</span></font></div>

</div>

</div>

</div></blockquote></div>