[slurm-users] Cgroups not constraining memory & cores
Sean Maxwell
stm at case.edu
Tue Nov 8 13:33:38 UTC 2022
Hi Sean,
I don't see PrologFlags=Contain in your slurm.conf. It is one of the
entries required to activate the cgroup containment:
https://slurm.schedmd.com/cgroup.conf.html#OPT_/etc/slurm/slurm.conf
Best,
-Sean
On Tue, Nov 8, 2022 at 8:16 AM Sean McGrath <smcgrat at tchpc.tcd.ie> wrote:
> Hi,
>
> I can't get cgroups to constrain memory or cores. If anyone is able to
> point out what I am doing wrong I would be very grateful please.
>
> Testing:
>
> Request a core and 2G of memory, log into it and compile a binary that
> just allocates memory quickly:
>
> $ salloc -n 1 --mem=2G
> $ ssh $SLURM_NODELIST
> $ cat stoopid-memory-overallocation.c
> /*
> *
> * Sometimes you need to over allocate the memory available to you.
> * This does so splendidly. I just hope you have limits set to kill it!
> *
> */
>
> int main()
> {
> while(1)
> {
> void *m = malloc(1024*1024);
> memset(m,0,1024*1024);
> }
> return 0;
> }
> $ gcc -o stoopid-memory-overallocation.x stoopid-memory-overallocation.c
>
> Checking memory usage before as a baseline:
>
> $ free -g
> total used free shared buff/cache
> available
> Mem: 251 1 246 0 3
> 248
> Swap: 7 0 7
>
> Launch the memory over allocation and check memory use subsequently and
> see that 34G has been allocated when I expect it to be constrained to 2G:
>
> $ ./stoopid-memory-overallocation.x &
> $ sleep 10 && free -g
> total used free shared buff/cache
> available
> Mem: 251 34 213 0 3
> 215
> Swap: 7 0 7
>
> Run another process to check cpu constraints:
>
> $ ./stoopid-memory-overallocation.x &
>
> Check it with top and I can see that the 2 processes are running
> simultaneously:
>
> $ top
> top - 13:04:44 up 13 days, 23:39, 2 users, load average: 0.63, 0.27, 0.11
> Tasks: 525 total, 3 running, 522 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.7 us, 5.5 sy, 0.0 ni, 93.7 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> MiB Mem : 257404.1 total, 181300.3 free, 72588.6 used, 3515.2 buff/cache
> MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 183300.3 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> 120978 smcgrat 20 0 57.6g 57.6g 968 R 100.0 22.9 0:22.63
> stoopid-memory-
> 120981 smcgrat 20 0 11.6g 11.6g 952 R 100.0 4.6 0:04.57
> stoopid-memory-
> ...
>
> Is this actually a valid test case or am I doing something else wrong?
>
> Thanks
>
> Sean
>
> Setup details:
>
> Ubuntu 20.04.5 LTS (Focal Fossa).
> slurm 21.08.8-2.
> cgroup-tools version 0.41-10 installed.
>
> The following was set in /etc/default/grub and update-grub run:
>
> GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
>
> Relevant parts of scontrol show conf
>
> JobAcctGatherType = jobacct_gather/none
> ProctrackType = proctrack/cgroup
> TaskPlugin = task/cgroup
> TaskPluginParam = (null type)
>
>
> The contents of the full slurm.conf
>
> ClusterName=neuro
> SlurmctldHost=neuro01(192.168.49.254)
> AuthType=auth/munge
> CommunicationParameters=block_null_hash
> CryptoType=crypto/munge
> Epilog=/home/support/slurm/etc/slurm.epilog.local
> EpilogSlurmctld=/home/support/slurm/etc/slurm.epilogslurmctld
> JobRequeue=0
> MaxJobCount=30000
> MpiDefault=none
> Prolog=/home/support/slurm/etc/prolog
> ReturnToService=2
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmUser=root
> StateSaveLocation=/var/slurm_state/neuro
> SwitchType=switch/none
> TaskPlugin=task/cgroup
> ProctrackType=proctrack/cgroup
> RebootProgram=/sbin/reboot
> InactiveLimit=0
> KillWait=30
> MinJobAge=300
> SlurmctldTimeout=300
> SlurmdTimeout=300
> Waittime=0
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> AccountingStorageHost=service01
> AccountingStorageType=accounting_storage/slurmdbd
> JobCompType=jobcomp/none
> JobAcctGatherFrequency=30
> SlurmctldDebug=3
> SlurmctldLogFile=/var/log/slurm.log
> SlurmdDebug=3
> SlurmdLogFile=/var/log/slurm.log
> DefMemPerNode=257300
> MaxMemPerNode=257300
> NodeName=neuro-n01-mgt RealMemory=257300 Sockets=2 CoresPerSocket=16
> State=UNKNOWN
> NodeName=neuro-n02-mgt RealMemory=257300 Sockets=2 CoresPerSocket=16
> State=UNKNOWN
> PartitionName=compute Nodes=ALL Default=YES MaxTime=5760 State=UP
> Shared=YES
>
>
> cgroup.conf file contents:
>
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> TaskAffinity=no
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221108/659ed289/attachment.htm>
More information about the slurm-users
mailing list