[slurm-users] Jobs can grow in RAM usage surpassing MaxMemPerNode

Rodrigo Santibáñez rsantibanez.uchile at gmail.com
Thu Jan 12 05:22:06 UTC 2023


Hi Cristóbal,

I would guess you need to set up a cgroup.conf file

###
# Slurm cgroup support configuration file
###
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
AllowedRAMSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=0
#ConstrainDevices=yes
MemorySwappiness=0
TaskAffinity=no
CgroupAutomount=yes
ConstrainCores=yes
#

Best,
Rodrigo

On Wed, Jan 11, 2023 at 10:50 PM Cristóbal Navarro <
cristobal.navarro.g at gmail.com> wrote:

> Hi Slurm community,
> Recently we found a small problem triggered by one of our jobs. We have a
> *MaxMemPerNode*=*532000* setting in our compute node in slurm.conf file,
> however we found out that a job that started with mem=65536, and after
> hours of execution it was able to grow its memory usage during execution up
> to ~650GB. We expected that *MaxMemPerNode* would stop any job exceeding
> the limit of 532000, did we miss something in the slurm.conf file? We were
> trying to avoid going into setting QOS for each group of users.
> any help is welcome
>
> Here is the node definition in the conf file
> ## Nodes list
> ## use native GPUs
> NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1
> RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8
> Feature=gpu
>
>
> And here is the full slurm.conf file
> # node health check
> HealthCheckProgram=/usr/sbin/nhc
> HealthCheckInterval=300
>
> ## Timeouts
> SlurmctldTimeout=600
> SlurmdTimeout=600
>
> GresTypes=gpu
> AccountingStorageTRES=gres/gpu
> DebugFlags=CPU_Bind,gres
>
> ## We don't want a node to go back in pool without sys admin
> acknowledgement
> ReturnToService=0
>
> ## Basic scheduling
> SelectType=select/cons_tres
> SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
> SchedulerType=sched/backfill
>
> ## Accounting
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStoreJobComment=YES
> AccountingStorageHost=10.10.0.1
> AccountingStorageEnforce=limits
>
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/linux
>
> TaskPlugin=task/cgroup
> ProctrackType=proctrack/cgroup
>
> ## scripts
> Epilog=/etc/slurm/epilog
> Prolog=/etc/slurm/prolog
> PrologFlags=Alloc
>
> ## MPI
> MpiDefault=pmi2
>
> ## Nodes list
> ## use native GPUs
> NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1
> RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8
> Feature=gpu
>
> ## Partitions list
> PartitionName=gpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=65556
> DefCpuPerGPU=8 DefMemPerGPU=65556 MaxMemPerNode=532000 MaxTime=3-12:00:00
> State=UP Nodes=nodeGPU01 Default=YES
> PartitionName=cpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=16384
> MaxMemPerNode=420000 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01
>
>
> --
> Cristóbal A. Navarro
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230112/8854c42d/attachment.htm>


More information about the slurm-users mailing list