[slurm-users] MaxMemPerCPU not enforced?

Groner, Rob rug262 at psu.edu
Mon Jul 24 13:30:37 UTC 2023


I'm not sure I can help with the rest, but the EnforcePartLimits setting will only reject a job at submission time that exceeds partition​ limits, not overall cluster limits.  I don't see anything, offhand, in the interactive partition definition that is exceeded by your request for 4 GB/CPU.

Rob


________________________________
From: slurm-users on behalf of Angel de Vicente
Sent: Monday, July 24, 2023 7:20 AM
To: Slurm User Community List
Subject: [slurm-users] MaxMemPerCPU not enforced?

Hello,

I'm trying to get Slurm to control the memory used per CPU, but it does
not seem to enforce the MaxMemPerCPU option in slurm.conf

This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3.

Relevant configuration options:

,----cgroup.conf
| AllowedRAMSpace=100
| ConstrainCores=yes
| ConstrainRAMSpace=yes
| ConstrainSwapSpace=yes
| AllowedSwapSpace=0
`----

,----slurm.conf
| TaskPlugin=task/affinity,task/cgroup
| PrologFlags=X11
|
| SelectType=select/cons_res
| SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
| MaxMemPerCPU=500
| DefMemPerCPU=200
|
| JobAcctGatherType=jobacct_gather/linux
|
| EnforcePartLimits=ALL
|
| NodeName=xxx RealMemory=257756 Sockets=4 CoresPerSocket=8 ThreadsPerCore=1 Weight=1
|
| PartitionName=batch       Nodes=duna State=UP Default=YES MaxTime=2-00:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:1
| PartitionName=interactive Nodes=duna State=UP Default=NO  MaxTime=08:00:00   MaxCPUsPerNode=32 OverSubscribe=FORCE:2
`----


I can ask for an interactive session with 4GB/CPU (I would have thought
that "EnforcePartLimits=ALL" would stop me from doing that), and once
I'm in the interactive session I can execute a 3GB test code without any
issues (I can see with htop that the process does indeed use a RES size
of 3GB at 100% CPU use). Any idea what could be the problem or how to
start debugging this?

,----
| [angelv at xxx test]$ sinter -n 1 --mem-per-cpu=4000
| salloc: Granted job allocation 127544
| salloc: Nodes xxx are ready for job
|
| (sinter) [angelv at xxx test]$ stress -m 1 -t 600 --vm-keep --vm-bytes 3G
| stress -m 1 -t 600 --vm-keep --vm-bytes 3G
| stress: info: [1772392] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
`----

Many thanks,
--
Ángel de Vicente
 Research Software Engineer (Supercomputing and BigData)
 Tel.: +34 922-605-747
 Web.: http://research.iac.es/proyecto/polmag/

 GPG: 0x8BDC390B69033F52
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230724/ae449b89/attachment.htm>


More information about the slurm-users mailing list