<div dir="ltr"><div class="gmail_quote"><div dir="ltr">Hi supers.<div><br></div><div>I am configuring a server with slurm/cgroups. This server will be the unique slurm node, so it is the head and the compute node at the same time. In order to force users to submit slurm jobs instead of running the processes directly on the server, I would like to use cgroups to isolate the last 8 CPUs in a cpuset for the users (acting as if they were the head node). The other CPUs can be used by any slurm job.<br></div><div><br></div><div>I followed instructions from many sites on internet, but the final configuration still do not do what I want. The processes started by normal users in fact are allocated at the last 8 CPUs, but also the slurm jobs that are submitted.<br></div><div><br></div><div>It seems that since jobs belong to normal users (not slurm user), they are also limited by cgroups.</div><div><br></div><div>Is it possible to achieve what I want?</div><div><br></div><div>Here are my configuration files:</div><div><br></div><div>========================================================================</div><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">/etc/cgrules.conf
</span><br></span><div>========================================================================</div><span style="font-family:monospace"># <user> <controllers> <destination>
<br>root cpu,cpuset,memory /
<br>slurm cpu,cpuset,memory /
<br>* cpu,cpuset,memory interactive<br></span></div><div><div><br></div><div><br></div><div>========================================================================</div><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">/etc/cgconfig.conf
</span><br></span><div>========================================================================</div><span style="font-family:monospace">group interactive {
<br> cpu {
<br> cpu.shares = 100;
<br> }
<br> cpuset {
<br> cpuset.cpus = 216-223;
<br> cpuset.cpu_exclusive = 1;
<br> cpuset.mem_exclusive = 1;
<br> cpuset.mem_hardwall = 1;
<br> cpuset.memory_migrate = 0;
<br> cpuset.memory_spread_page = 0;
<br> cpuset.memory_spread_slab = 0;
<br> cpuset.mems = 0;
<br> cpuset.sched_load_balance = 0;
<br> cpuset.sched_relax_domain_level = -1;
<br> }
<br> memory {
<br> memory.limit_in_bytes = 8G;
<br> memory.swappiness = 41;
<br> memory.memsw.limit_in_bytes = 8G;
<br> }
<br>}
<br><br></span></div><div><br></div><div>========================================================================</div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">slurm.conf </span><br></span><div>========================================================================</div><span style="font-family:monospace">ControlMachine=vital
<br>ControlAddr=172.25.2.25
<br>AuthType=auth/munge
<br>CryptoType=crypto/munge
<br>GresTypes=gpu
<br>MaxTasksPerNode=216
<br>MpiDefault=none
<br>ProctrackType=proctrack/cgroup
<br>ReturnToService=1
<br>SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
<br>SlurmctldPort=6817
<br>SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
<br>SlurmdPort=6818
<br>SlurmdSpoolDir=/var/spool/slurmd
<br>SlurmUser=slurm
<br>StateSaveLocation=/var/spool/slurm-llnl
<br>SwitchType=switch/none
<br>TaskPlugin=task/cgroup
<br>TaskPluginParam=sched
<br>InactiveLimit=0
<br>KillWait=30
<br>MinJobAge=300
<br>SlurmctldTimeout=120
<br>SlurmdTimeout=300
<br>Waittime=0
<br>DefMemPerNode=998749
<br>FastSchedule=1
<br>SchedulerType=sched/backfill
<br>SelectType=select/cons_res
<br>SelectTypeParameters=CR_CPU_Memory
<br>AccountingStorageHost=vital
<br>AccountingStorageLoc=slurm_acct_db
<br>AccountingStoragePass=/var/run/munge/munge.socket.2
<br>AccountingStoragePort=6819
<br>AccountingStorageType=accounting_storage/slurmdbd
<br>AccountingStorageUser=slurm
<br>AccountingStoreJobComment=YES
<br>ClusterName=bioinfo
<br>JobCompHost=vital
<br>JobCompLoc=slurm_acct_db
<br>JobCompPass=aikeeCu4S
<br>JobCompPort=6819
<br>JobCompType=jobcomp/slurmdbd
<br>JobCompUser=slurm
<br>JobAcctGatherFrequency=30
<br>JobAcctGatherType=jobacct_gather/cgroup
<br>SlurmctldDebug=verbose
<br>SlurmdDebug=verbose
<br>BurstBufferType=burst_buffer/generic
<br>NodeName=vital NodeAddr=172.25.2.25 CPUs=224 RealMemory=1031517 Sockets=4 CoresPerSocket=28 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1 MemSpecLimit=32768
<br>PartitionName=batch Nodes=vital OverSubscribe=YES Default=YES MaxTime=INFINITE State=UP<br></span><div><br></div><div><br></div><div>========================================================================</div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">cgroup.conf </span><br></span><div>========================================================================</div><span style="font-family:monospace">CgroupMountpoint="/sys/fs/cgroup"<br>CgroupAutomount=yes
<br>AllowedRAMSpace=100
<br>AllowedSwapSpace=0
<br>ConstrainCores=no
<br>ConstrainDevices=yes
<br>ConstrainKmemSpace=no
<br>ConstrainRAMSpace=no
<br>ConstrainSwapSpace=no
<br>MaxRAMPercent=100
<br>MaxSwapPercent=100
<br>TaskAffinity=no</span></div><br><br>Thanks in advance for any help.<br><br>--<br>David da Silva Pires</div>
</div></div>