[slurm-users] MaxMemPerCPU not enforced?

Mon Jul 24 14:21:14 UTC 2023

Slurm will allocate more cpus to cover the memory requirement. Use
sacct's query fields to compare Requested Resources vs. Allocated Resources:

$ scontrol show part normal_q | grep MaxMem
   DefMemPerCPU=1920 MaxMemPerCPU=1920

$ srun -n 1 --mem-per-cpu=4000 --partition=normal_q --account=arcadm
hostname
srun: job 1577313 queued and waiting for resources
srun: job 1577313 has been allocated resources
tc095

$ sacct -j 1577313 -o jobid,reqtres%35,alloctres%35
       JobID                             ReqTRES
AllocTRES
------------ -----------------------------------
-----------------------------------
1577313         billing=1,cpu=1,mem=4000M,node=1
 billing=3,cpu=3,mem=4002M,node=1
1577313.ext+
 billing=3,cpu=3,mem=4002M,node=1
1577313.0
cpu=3,mem=4002M,node=1

>From the Slurm manuals (eg. man srun):
 --mem-per-cpu=<size>[units]
Minimum  memory required per allocated CPU. ... Note that if the job's
--mem-per-cpu value exceeds the configured MaxMemPerCPU, then  the user's
 limit  will be treated as a memory limit per task

On Mon, Jul 24, 2023 at 9:32 AM Groner, Rob <rug262 at psu.edu> wrote:

> I'm not sure I can help with the rest, but the EnforcePartLimits setting
> will only reject a job at submission time that exceeds *partition*
> limits, not overall cluster limits.  I don't see anything, offhand, in the
> interactive partition definition that is exceeded by your request for 4
> GB/CPU.
>
> Rob
>
>
> ------------------------------
> *From:* slurm-users on behalf of Angel de Vicente
> *Sent:* Monday, July 24, 2023 7:20 AM
> *To:* Slurm User Community List
> *Subject:* [slurm-users] MaxMemPerCPU not enforced?
>
> Hello,
>
> I'm trying to get Slurm to control the memory used per CPU, but it does
> not seem to enforce the MaxMemPerCPU option in slurm.conf
>
> This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3.
>
> Relevant configuration options:
>
> ,----cgroup.conf
> | AllowedRAMSpace=100
> | ConstrainCores=yes
> | ConstrainRAMSpace=yes
> | ConstrainSwapSpace=yes
> | AllowedSwapSpace=0
> `----
>
> ,----slurm.conf
> | TaskPlugin=task/affinity,task/cgroup
> | PrologFlags=X11
> |
> | SelectType=select/cons_res
> | SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
> | MaxMemPerCPU=500
> | DefMemPerCPU=200
> |
> | JobAcctGatherType=jobacct_gather/linux
> |
> | EnforcePartLimits=ALL
> |
> | NodeName=xxx RealMemory=257756 Sockets=4 CoresPerSocket=8
> ThreadsPerCore=1 Weight=1
> |
> | PartitionName=batch       Nodes=duna State=UP Default=YES
> MaxTime=2-00:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:1
> | PartitionName=interactive Nodes=duna State=UP Default=NO
> MaxTime=08:00:00   MaxCPUsPerNode=32 OverSubscribe=FORCE:2
> `----
>
>
> I can ask for an interactive session with 4GB/CPU (I would have thought
> that "EnforcePartLimits=ALL" would stop me from doing that), and once
> I'm in the interactive session I can execute a 3GB test code without any
> issues (I can see with htop that the process does indeed use a RES size
> of 3GB at 100% CPU use). Any idea what could be the problem or how to
> start debugging this?
>
> ,----
> | [angelv at xxx test]$ sinter -n 1 --mem-per-cpu=4000
> | salloc: Granted job allocation 127544
> | salloc: Nodes xxx are ready for job
> |
> | (sinter) [angelv at xxx test]$ stress -m 1 -t 600 --vm-keep --vm-bytes 3G
> | stress -m 1 -t 600 --vm-keep --vm-bytes 3G
> | stress: info: [1772392] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> `----
>
> Many thanks,
> --
> Ángel de Vicente
>  Research Software Engineer (Supercomputing and BigData)
>  Tel.: +34 922-605-747
>  Web.: http://research.iac.es/proyecto/polmag/
>
>  GPG: 0x8BDC390B69033F52
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230724/e4262b5e/attachment-0001.htm>