[slurm-users] Gentle memory limits in Slurm using cgroup?

Alexander Åhman alexander at ydesign.se
Thu May 2 14:53:57 UTC 2019


Hi,
Is it possible to configure slurm/cgroups in such way that jobs that are 
using more memory than they asked for are not killed if there still are 
free memory available on the compute node? When free memory gets low 
these jobs can be killed as usual.

Today when a job has exceeded its limits it is killed immediately. Since 
the applications only requires maximum memory for a short period of time 
we can often not run as many concurrent jobs as we want.

Maybe I can rephrase the question a bit: How can you configure memory 
limits for a job when the job only needs maximum memory during a short 
time? Example: Job1 needs 80G RAM but only during 15% of the execution 
time, during the remaining 85% it only needs 30G.

I guess the obvious thing is to use "CR_Core" instead of 
"CR_Core_Memory" we use today. But we have to constrain memory in some 
way because the nodes are also running daemons for the distributed file 
system and that must not be affected by running jobs.

Any ideas?

Regards,
Alexander




More information about the slurm-users mailing list