<div dir="ltr">Hi Alex.<div><br></div><div>Thank you very much for sending the cgroup-related settings of you cluster.</div><div><br></div><div>I implemented a result for the problem. The solution was based on your advices and the ones that I found at the following URL:</div><div><br></div><div><a href="http://rolk.github.io/2015/04/20/slurm-cluster" target="_blank">http://rolk.github.io/2015/04/20/slurm-cluster</a><br></div><div><br></div><div>Now, the user processes started without making use of sbatch, srun or salloc are allocated at the last 8 threads of the server (numbered 217-224). If one of the three mentioned commands are used, then the jobs can be allocated at any of the 224 threads.</div><div><br></div><div>This situation is almost the perfect one. I still would like to avoid the last 8 threads for slurm jobs in order to avoid jobs and usual processes sharing the same threads, which can impact negatively for both.</div><div><br></div><div>While I try to do this, here is my current configuration:</div><div><br></div><div><br></div><div>=============================================================================</div><div>/etc/slurm-llnl/slurm.conf</div><div><div>=============================================================================</div></div><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">ControlAddr=172.25.2.25
</span><br>AuthType=auth/munge
<br>CacheGroups=0
<br>CryptoType=crypto/munge
<br>
<br>GresTypes=gpu
<br>MaxTasksPerNode=216
<br>
<br>SlurmUser=slurm
<br>SlurmctldPort=6817
<br>SlurmdPort=6818
<br>SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
<br>StateSaveLocation=/var/lib/slurm-llnl/slurmctld
<br>SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
<br>SlurmdPidFile=/var/run/slurm-llnl/slurmd%n.pid
<br>SwitchType=switch/none
<br>ProctrackType=proctrack/cgroup
<br>MpiDefault=none
<br>RebootProgram=/sbin/reboot
<br>
<br>ReturnToService=2
<br>
<br>TaskPluginParam=sched
<br>InactiveLimit=0
<br>KillWait=30
<br>MinJobAge=300
<br>SlurmctldTimeout=120
<br>SlurmdTimeout=1800
<br>Waittime=0
<br>SchedulerType=sched/backfill
<br>PreemptMode=suspend,gang
<br>PreemptType=preempt/partition_prio
<br>DefMemPerNode=998749
<br>FastSchedule=1
<br>
<br>SelectType=select/cons_res
<br>SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
<br>TaskPlugin=task/affinity,task/cgroup
<br>PriorityType=priority/multifactor
<br>PriorityDecayHalfLife=3-0
<br>PriorityFavorSmall=YES
<br>PriorityMaxAge=7-0
<br>PriorityWeightAge=1000
<br>PriorityWeightFairshare=0
<br>PriorityWeightJobSize=125
<br>PriorityWeightPartition=1000
<br>PriorityWeightQOS=0
<br>SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
<br>SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
<br>
<br>AccountingStorageType=accounting_storage/filetxt
<br>
<br>CheckpointType=checkpoint/none
<br>
<br>AccountingStorageHost=vital
<br>AccountingStorageLoc=/var/log/slurm-llnl/accounting
<br>AccountingStoragePass=/var/run/munge/munge.socket.2
<br>AccountingStoragePort=6819
<br>AccountingStorageUser=slurm
<br>AccountingStoreJobComment=YES
<br>
<br>ClusterName=bioinfo
<br>ControlMachine=vital
<br>
<br>JobCompHost=vital
<br>JobCompLoc=/var/log/slurm-llnl/job_completions
<br>JobCompPass=<xxxxxxxx> <br>JobCompPort=6819
<br>JobCompType=jobcomp/filetxt
<br>JobCompUser=slurm
<br>JobAcctGatherFrequency=30
<br>JobAcctGatherType=jobacct_gather/cgroup
<br>SlurmctldDebug=verbose
<br>SlurmdDebug=verbose
<br>BurstBufferType=burst_buffer/generic
<br>NodeName=vital NodeAddr=172.25.2.25 CPUs=224 RealMemory=1031517 Sockets=4 CoresPerSocket=28 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1 MemSpecLimit=32768
<br>
<br>PartitionName=batch Nodes=vital Shared=FORCE:1 OverSubscribe=YES Default=YES MaxTime=INFINITE State=UP<br>
</span><div><br class="gmail-Apple-interchange-newline"><br></div><div>=============================================================================</div><div>/etc/slurm-llnl/cgroup.conf</div><div><div>=============================================================================</div></div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">CgroupMountpoint="/sys/fs/cgroup"
</span><br>CgroupAutomount=yes
<br>CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
<br>ConstrainCores=yes
<br>ConstrainRAMSpace=yes<br></span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:monospace"><br></span></div>=============================================================================<br>/etc/cgconfig.conf<br>=============================================================================<br><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">group interactive {
</span><br>   cpu {
<br>   }
<br>   cpuset {
<br>      cpuset.cpus = 216-223;
<br>      cpuset.cpu_exclusive = 1;
<br>      cpuset.mem_exclusive = 1;
<br>      cpuset.mem_hardwall = 1;
<br>      cpuset.memory_migrate = 0;
<br>      cpuset.memory_spread_page = 0;
<br>      cpuset.memory_spread_slab = 0;
<br>      cpuset.mems = 0;
<br>      cpuset.sched_load_balance = 0;
<br>      cpuset.sched_relax_domain_level = -1;
<br>   }
<br>   memory {
<br>      memory.limit_in_bytes = "8G";
<br>      memory.memsw.limit_in_bytes = "8G";
<br>   }
<br>}<br>
</span><br><br>=============================================================================<br>/etc/cgrules.conf<br>=============================================================================<br></div><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">root  cpu,memory /
</span><br>slurm cpu,memory /
<br>* cpuset,memory /interactive<br>
<br></span></div><br>The ideia is to create a cgroups with a cpuset for slurm, from 1 to 216. Let's see if it works.<br><br>Best.<br><br>--<br>David da Silva Pires<div><span style="font-family:monospace"><br></span></div></div>