[slurm-users] RAM "overbooking"

Wed May 27 22:46:46 UTC 2020

Hello all

We have a single node simple slurm installation with the following hardware
configuration:

NodeName=node01 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=102 CPUErr=0 CPUTot=160 CPULoad=67.09
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=biotec01 NodeHostName=biotec01 Version=16.05
   OS=Linux RealMemory=1200000 AllocMem=1093632 FreeMem=36066 Sockets=160 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=2020-04-19T17:22:31 SlurmdStartTime=2020-04-20T13:54:34
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Slurm version is 16.05 (we are about to upgrade do Debian 10 and slurm 18.08 from
the repo):

Everything is working as expected except but we have the following
"problem":

Users use sbatch to submit their jobs but usually reserve way too much RAM for the
job causing other jobs queued waiting for RAM even when the actual RAM usage is
very low.

Is there a recommended solution for this problem? Is there an way to say
slurm to start a job "overbooking" some RAM by, say, 20%?

Thanks for any recommendation.

slurm.conf:

ControlMachine=node01
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/cgroup
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
AccountingStorageType=accounting_storage/slurmdbd
ClusterName=cluster
JobAcctGatherType=jobacct_gather/linux
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
DebugFlags=NO_CONF_HASH
NodeName=biotec01 CPUs=160 RealMemory=1200000 State=UNKNOWN
PartitionName=short Nodes=node01 Default=YES MaxTime=24:00:00 State=UP Priority=30
PartitionName=long Nodes=node01 MaxTime=30-00:00:00 State=UP Priority=20
PartitionName=test Nodes=node01 MaxTime=1 State=UP MaxCPUsPerNode=3 Priority=30