[slurm-users] Jobs can grow in RAM usage surpassing MaxMemPerNode

Thu Jan 12 01:47:53 UTC 2023

Hi Slurm community,
Recently we found a small problem triggered by one of our jobs. We have a
*MaxMemPerNode*=*532000* setting in our compute node in slurm.conf file,
however we found out that a job that started with mem=65536, and after
hours of execution it was able to grow its memory usage during execution up
to ~650GB. We expected that *MaxMemPerNode* would stop any job exceeding
the limit of 532000, did we miss something in the slurm.conf file? We were
trying to avoid going into setting QOS for each group of users.
any help is welcome

Here is the node definition in the conf file
## Nodes list
## use native GPUs
NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1
RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8
Feature=gpu

And here is the full slurm.conf file
# node health check
HealthCheckProgram=/usr/sbin/nhc
HealthCheckInterval=300

## Timeouts
SlurmctldTimeout=600
SlurmdTimeout=600

GresTypes=gpu
AccountingStorageTRES=gres/gpu
DebugFlags=CPU_Bind,gres

## We don't want a node to go back in pool without sys admin acknowledgement
ReturnToService=0

## Basic scheduling
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
SchedulerType=sched/backfill

## Accounting
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
AccountingStorageHost=10.10.0.1
AccountingStorageEnforce=limits

JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux

TaskPlugin=task/cgroup
ProctrackType=proctrack/cgroup

## scripts
Epilog=/etc/slurm/epilog
Prolog=/etc/slurm/prolog
PrologFlags=Alloc

## MPI
MpiDefault=pmi2

## Nodes list
## use native GPUs
NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1
RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8
Feature=gpu

## Partitions list
PartitionName=gpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=65556
DefCpuPerGPU=8 DefMemPerGPU=65556 MaxMemPerNode=532000 MaxTime=3-12:00:00
State=UP Nodes=nodeGPU01 Default=YES
PartitionName=cpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=16384
MaxMemPerNode=420000 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01

-- 
Cristóbal A. Navarro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230111/387eef30/attachment.htm>