[slurm-users] Node OverSubscribe even if set to no
Stéphane Larose
Stephane.Larose at ibis.ulaval.ca
Tue Apr 17 13:40:58 MDT 2018
Hi all,
I found out a way to avoid oversubscribing. I had to comment this configuration:
PreemptMode=Suspend,Gang
PreemptType=preempt/partition_prio
In my actual configuration, all the partitions are at the same priority. At times, I increase the priority of a partition and jobs in other partitions are suspended. That works fine. But I still do not understand why oversubscribing occurs when preemption is activated. I would like to keep preemption by suspending and not get oversubscription. If anyone have an idea of how to do this.
Thank you!
Stéphane
-----Message d'origine-----
De : Stéphane Larose
Envoyé : 17 avril 2018 10:02
À : 'Slurm User Community List' <slurm-users at lists.schedmd.com>
Objet : RE: [slurm-users] Node OverSubscribe even if set to no
Hi Chris,
> You might want to double check the config is acting as expected with:
>
> scontrol show part | fgrep OverSubscribe
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
> Also what does this say?
>
> scontrol show config | fgrep SelectTypeParameters
SelectTypeParameters = CR_CPU_MEMORY
From the doc, it seems that only CR_Memory implies OverSubscribe=YES :
All CR_s assume OverSubscribe=No or OverSubscribe=Force EXCEPT for CR_MEMORY which assumes OverSubscribe=Yes
When I do "scontrol list jobs", all jobs have OverSubscribe=OK (which is not Yes). Again from the docs it seems fine: "OK" otherwise (typically allocated dedicated CPUs)
Thanks again,
Stéphane
-----Message d'origine-----
De : slurm-users <slurm-users-bounces at lists.schedmd.com> De la part de Chris Samuel Envoyé : 17 avril 2018 04:29 À : slurm-users at lists.schedmd.com Objet : Re: [slurm-users] Node OverSubscribe even if set to no
On Tuesday, 17 April 2018 5:26:26 AM AEST Stéphane Larose wrote:
> So some jobs are now sharing the same cores but I don’t understand why
> since OverSubscribe is set to no.
You might want to double check the config is acting as expected with:
scontrol show part | fgrep OverSubscribe
Also what does this say?
scontrol show config | fgrep SelectTypeParameters
I note that if you've got CR_Memory then:
CR_Memory
Memory is a consumable resource. NOTE: This
implies OverSubscribe=YES or OverSubscribe=FORCE
for all partitions. Setting a value for DefMem‐
PerCPU is strongly recommended.
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list