[slurm-users] Strange memory limit behavior with --mem-per-gpu

Fri Apr 8 08:02:19 UTC 2022

Paul Raines <raines at nmr.mgh.harvard.edu> writes:

> Basically, it appears using --mem-per-gpu instead of just --mem gives
> you unlimited memory for your job.
>
> $ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00
> --ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem-per-gpu=8G
> --mail-type=FAIL --pty /bin/bash
> rtx-07[0]:~$ find /sys/fs/cgroup/memory/ -name job_$SLURM_JOBID
> /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067
> rtx-07[0]:~$ cat /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067/memory.limit_in_bytes
> 1621419360256
>
> That is a limit of 1.5TB which is all the memory on rtx-07, not
> the 8G I effectively asked for at 1 GPU and 8G per GPU.

Which version of Slurm is this?  We noticed a behaviour similar to this
on Slurm 20.11.8, but when we tested it on 21.08.1, we couldn't
reproduce it.  (We also noticed an issue with --gpus-per-task that
appears to have been fixed in 21.08.)

-- 
B/H
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220408/e3a7fea5/attachment.sig>