[slurm-users] Node OverSubscribe even if set to no

Stéphane Larose Stephane.Larose at ibis.ulaval.ca
Mon Apr 16 13:26:26 MDT 2018


Hello,

I have Slurm 17.11 installed on a 64 cores server. My 9 partitions are set with OverSubscribe=NO. I would expect that when all 64 cores are assigned to jobs, Slurm would just put new jobs in PENDING state. But it starts running new jobs so that more than 64 cores are assigned. Looking at the slurmctld log, we can see that cores 21, 22 and 24 to 38 are used in more than one partition right now:

[2018-04-16T15:00:00.439] node:katak cpus:64 c:8 s:8 t:1 mem:968986 a_mem:231488 state:11
[2018-04-16T15:00:00.439] part:ibismini rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 6: bitmap: 4,6-12,16-33,48-55
[2018-04-16T15:00:00.439] part:ibisinter rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 1: bitmap: 24-41
[2018-04-16T15:00:00.439] part:ibismax rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 3: bitmap: 21-22,24-38,42-47,56-63
[2018-04-16T15:00:00.439] part:rclevesq rows:1 prio:10
[2018-04-16T15:00:00.439] part:ibis1 rows:1 prio:10
[2018-04-16T15:00:00.439] part:ibis2 rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 1: bitmap: 32-37

So some jobs are now sharing the same cores but I don't understand why since OverSubscribe is set to no.

Thanks for your help!

---
Stéphane Larose
Analyste de l'informatique
Institut de Biologie Intégrative et des Systèmes (IBIS)
Pavillon Charles-Eugène-Marchand
Université Laval

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180416/390af1b7/attachment.html>


More information about the slurm-users mailing list