<div dir="ltr"><div>Hi Cristóbal,</div><div><br></div><div>I would guess you need to set up a cgroup.conf file</div><div><br></div><div>###<br># Slurm cgroup support configuration file<br>###<br>ConstrainRAMSpace=yes<br>ConstrainSwapSpace=yes<br>AllowedRAMSpace=100<br>AllowedSwapSpace=0<br>MaxRAMPercent=100<br>MaxSwapPercent=0<br>#ConstrainDevices=yes<br>MemorySwappiness=0<br>TaskAffinity=no<br>CgroupAutomount=yes<br>ConstrainCores=yes<br>#</div><div><br></div><div>Best,</div><div>Rodrigo<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 11, 2023 at 10:50 PM Cristóbal Navarro <<a href="mailto:cristobal.navarro.g@gmail.com">cristobal.navarro.g@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Slurm community,</div><div>Recently we found a small problem triggered by one of our jobs. We have a <b>MaxMemPerNode</b>=<b>532000</b> setting in our compute node in slurm.conf file, however we found out that a job that started with mem=65536, and after hours of execution it was able to grow its memory usage during execution up to ~650GB. We expected that <b>MaxMemPerNode</b> would stop any job exceeding the limit of 532000, did we miss something in the slurm.conf file? We were trying to avoid going into setting QOS for each group of users.<br></div><div>any help is welcome<br></div><div><br></div><div>Here is the node definition in the conf file</div><div><span style="font-family:monospace">## Nodes list<br>## use native GPUs<br>NodeName=nodeGPU01
SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=1024000
MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8 Feature=gpu</span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:monospace"><font face="arial,sans-serif">And here is the full slurm.conf file</font><br></span></div><div><span style="font-family:monospace"># node health check<br>HealthCheckProgram=/usr/sbin/nhc<br>HealthCheckInterval=300<br><br>## Timeouts<br>SlurmctldTimeout=600<br>SlurmdTimeout=600<br><br>GresTypes=gpu<br>AccountingStorageTRES=gres/gpu<br>DebugFlags=CPU_Bind,gres<br><br>## We don't want a node to go back in pool without sys admin acknowledgement<br>ReturnToService=0<br><br>## Basic scheduling<br>SelectType=select/cons_tres<br>SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE<br>SchedulerType=sched/backfill<br><br>## Accounting <br>AccountingStorageType=accounting_storage/slurmdbd<br>AccountingStoreJobComment=YES<br>AccountingStorageHost=10.10.0.1<br>AccountingStorageEnforce=limits<br><br>JobAcctGatherFrequency=30<br>JobAcctGatherType=jobacct_gather/linux<br><br>TaskPlugin=task/cgroup<br>ProctrackType=proctrack/cgroup<br><br>## scripts<br>Epilog=/etc/slurm/epilog<br>Prolog=/etc/slurm/prolog<br>PrologFlags=Alloc<br><br>## MPI<br>MpiDefault=pmi2<br><br>## Nodes list<br>## use native GPUs<br>NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8 Feature=gpu<br><br>## Partitions list<br>PartitionName=gpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=65556 DefCpuPerGPU=8 DefMemPerGPU=65556 MaxMemPerNode=532000 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01 Default=YES <br>PartitionName=cpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=16384 MaxMemPerNode=420000 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01</span> <br></div><div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div>Cristóbal A. Navarro</div></div></div></div></div>
</blockquote></div>