[slurm-users] Cores shared between jobs even with OverSubscribe=NO with 17.02.6
Lech Nieroda
lech.nieroda at uni-koeln.de
Tue Aug 14 03:01:42 MDT 2018
Dear Slurm Users,
we've observed a strange issue with oversubscription, namely cores
being shared by multiple jobs.
We are using the CR_CPU_Memory resource selection plugin, which unlike
CR_Memory doesn't enforce oversubscription, a short partition check
confirms this:
$ scontrol show part | grep -o 'OverSubscribe=.*' | sort -u
OverSubscribe=NO
However, oversubscription occurs, as seen in this example where a
single core is used by two jobs by two different users (user data
anonymized):
/cgroup/cpuset/slurm/uid_123/job_10022564/cpus
8
/cgroup/cpuset/slurm/uid_456/job_10009002/cpus
8
As a consequence, they can only use the core up to 50%, which hinders
performance ('top' output):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
1913 userx 20 0 125m 31m 4100 R 49.9 0.1 725:50.53 8 AppX
15480 usery 20 0 815m 163m 17m R 49.9 0.7 40:51.05 8 AppY
When checking the jobs with squeue, the 'OVER_SUBSCRIBE' attribute
says 'OK' which according to the manual should mean dedicated
allocation:
$ squeue -j 10022564,10009002 -O jobid,oversubscribe
JOBID OVER_SUBSCRIBE
10009002 OK
10022564 OK
Any ideas why the cores are shared rather than dedicated to each job?
We are using cgroup plugins where applicable:
...
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
JobAcctGatherType=jobacct_gather/cgroup
...
there's no preemption and the cgroup.conf looks like this:
CgroupAutomount=yes
CgroupMountpoint=/cgroup
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainKmemSpace=yes
AllowedSwapSpace=0
Kind regards,
Lech
--
Lech Nieroda
Zentrum für Angewandte Informatik (ZAIK/RRZK)
Universität zu Köln
Robert-Koch-Str. 10
Gebäude 55 (RRZK-R2), Raum 210 (3. Etage)
D-50931 Köln
Deutschland
Tel.: +49 (221) 478-7021
Fax: +49 (221) 478-5568
E-Mail: nieroda.lech at uni-koeln.de
More information about the slurm-users
mailing list