[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?
Renfro, Michael
Renfro at tntech.edu
Mon Oct 7 20:42:55 UTC 2019
Our cgroup settings are quite a bit different, and we don’t allow jobs to swap, but the following works to limit memory here (I know, because I get emails frequent emails from users who don’t change their jobs from the default 2 GB per CPU that we use):
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
CgroupReleaseAgentDir="/etc/slurm/cgroup"
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
ConstrainCores=yes # Not the Slurm default
TaskAffinity=no # Slurm default
ConstrainRAMSpace=no # Slurm default
ConstrainSwapSpace=no # Slurm default
ConstrainDevices=no # Slurm default
AllowedRamSpace=100 # Slurm default
AllowedSwapSpace=0 # Slurm default
MaxRAMPercent=100 # Slurm default
MaxSwapPercent=100 # Slurm default
MinRAMSpace=30 # Slurm default
> On Oct 7, 2019, at 11:55 AM, Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> Hello,
>
> I tried using, in slurm.conf
> TaskPlugin=task/affinity, task/cgroup
> SelectTypeParameters=CR_CPU_Memory
> MemLimitEnforce=yes
>
> and in cgroup.conf:
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
> MaxSwapPercent=10
> TaskAffinity=no
>
> But when the job reaches its limit, it passes alternately from R to D state without being killed, even when it exceeds the 10% of swap partition allowed.
>
> Do you have an idea to do this?
>
> Regards,
>
> Jean-Mathieu
More information about the slurm-users
mailing list