Paul Raines raines at nmr.mgh.harvard.edu
Wed Apr 6 19:30:14 UTC 2022

I have a user who submitted an interactive srun job using:

srun --mem-per-gpu 64 --gpus 1 --nodes 1 ....

>From sacct for this job we see:

         ReqTRES : billing=4,cpu=1,gres/gpu=1,mem=10G,node=1
       AllocTRES : billing=4,cpu=1,gres/gpu=1,mem=64M,node=1

(where 10G I assume comes from the DefMemPerCPU=10240 set in slurm.conf)

Now I think the user here made a mistake and 64M should be way too
little for the job but it is running fine.  They may have forgot the
'G' and meant to do 64G

The user submitted two jobs just like this, and both are running on the 
same node where I see:

5496 nms88  20   0  521.1g 453.2g 175852 S 100.0  30.0   1110:37 python
5555 nms88  20   0  484.7g 413.3g 182456 S  93.8  27.4   1065:22 python

and if I cd to /sys/fs/cgroup/memory/slurm/uid_5143603/job_1120342
for one of the jobs I see:

# cat memory.limit_in_bytes
# cat memory.usage_in_bytes

(the node itself has 1.5TB of RAM total)

So my question is why did SLURM end up running the job this way?  Why
was the cgroup limit not 64MB which would have made the job fail
with OOM pretty quickly?

On someone else's job submitted with

srun -N 1 --ntasks-per-node=1 --gpus=1 --mem=128G --cpus-per-task=3 ...

on the node in the memory cgroup I see the expected

# cat memory.limit_in_bytes

But I worry it could fail since those other two jobs are essentially
consuming all the memory.

