[slurm-users] Cores shared between jobs even with OverSubscribe=NO with 17.02.6

Lech Nieroda lech.nieroda at uni-koeln.de
Tue Aug 14 03:01:42 MDT 2018


Dear Slurm Users,

we've observed a strange issue with oversubscription, namely cores  
being shared by multiple jobs.

We are using the CR_CPU_Memory resource selection plugin, which unlike  
CR_Memory doesn't enforce oversubscription, a short partition check  
confirms this:

$ scontrol show part | grep -o 'OverSubscribe=.*' | sort -u
OverSubscribe=NO

However, oversubscription occurs, as seen in this example where a  
single core is used by two jobs by two different users (user data  
anonymized):

/cgroup/cpuset/slurm/uid_123/job_10022564/cpus
8
/cgroup/cpuset/slurm/uid_456/job_10009002/cpus
8

As a consequence, they can only use the core up to 50%, which hinders  
performance ('top' output):
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND
1913 userx  20   0  125m  31m 4100 R 49.9  0.1 725:50.53  8 AppX
15480 usery  20   0  815m 163m  17m R 49.9  0.7  40:51.05  8 AppY

When checking the jobs with squeue, the 'OVER_SUBSCRIBE' attribute  
says 'OK' which according to the manual should mean dedicated  
allocation:

$ squeue -j 10022564,10009002 -O jobid,oversubscribe
JOBID               OVER_SUBSCRIBE
10009002            OK
10022564            OK

Any ideas why the cores are shared rather than dedicated to each job?
We are using cgroup plugins where applicable:

...
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
JobAcctGatherType=jobacct_gather/cgroup
...

there's no preemption and the cgroup.conf looks like this:

CgroupAutomount=yes
CgroupMountpoint=/cgroup
CgroupReleaseAgentDir="/etc/slurm/cgroup"

ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainKmemSpace=yes
AllowedSwapSpace=0

Kind regards,
Lech



-- 
Lech Nieroda
Zentrum für Angewandte Informatik (ZAIK/RRZK)
Universität zu Köln
Robert-Koch-Str. 10
Gebäude 55 (RRZK-R2), Raum 210 (3. Etage)
D-50931 Köln
Deutschland

Tel.: +49 (221) 478-7021
Fax: +49 (221) 478-5568
E-Mail: nieroda.lech at uni-koeln.de




More information about the slurm-users mailing list