[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

Renfro, Michael Renfro at tntech.edu
Mon Oct 7 20:42:55 UTC 2019


Our cgroup settings are quite a bit different, and we don’t allow jobs to swap, but the following works to limit memory here (I know, because I get emails frequent emails from users who don’t change their jobs from the default 2 GB per CPU that we use):

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
CgroupReleaseAgentDir="/etc/slurm/cgroup"
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
ConstrainCores=yes    # Not the Slurm default
TaskAffinity=no       # Slurm default
ConstrainRAMSpace=no  # Slurm default
ConstrainSwapSpace=no # Slurm default
ConstrainDevices=no   # Slurm default
AllowedRamSpace=100   # Slurm default
AllowedSwapSpace=0    # Slurm default
MaxRAMPercent=100     # Slurm default
MaxSwapPercent=100    # Slurm default
MinRAMSpace=30        # Slurm default

> On Oct 7, 2019, at 11:55 AM, Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> Hello,
> 
> I tried using, in slurm.conf
> TaskPlugin=task/affinity, task/cgroup
> SelectTypeParameters=CR_CPU_Memory
> MemLimitEnforce=yes
> 
> and in cgroup.conf:
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
> MaxSwapPercent=10
> TaskAffinity=no
> 
> But when the job reaches its limit, it passes alternately from R to D state without being killed, even when it exceeds the 10% of swap partition allowed.
> 
> Do you have an idea to do this?
> 
> Regards,
> 
> Jean-Mathieu



More information about the slurm-users mailing list