[slurm-users] Node OverSubscribe even if set to no

Stéphane Larose Stephane.Larose at ibis.ulaval.ca
Tue Apr 17 13:40:58 MDT 2018


Hi all,

I found out a way to avoid oversubscribing. I had to comment this configuration:

PreemptMode=Suspend,Gang
PreemptType=preempt/partition_prio

In my actual configuration, all the partitions are at the same priority. At times, I increase the priority of a partition and jobs in other partitions are suspended. That works fine.  But I still do not understand why oversubscribing occurs when preemption is activated. I would like to keep preemption by suspending and not get oversubscription. If anyone have an idea of how to do this.

Thank you!

Stéphane

-----Message d'origine-----
De : Stéphane Larose 
Envoyé : 17 avril 2018 10:02
À : 'Slurm User Community List' <slurm-users at lists.schedmd.com>
Objet : RE: [slurm-users] Node OverSubscribe even if set to no

Hi Chris,

> You might want to double check the config is acting as expected with:
>
> scontrol show part | fgrep OverSubscribe

   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO

> Also what does this say?
>
> scontrol show config | fgrep SelectTypeParameters

SelectTypeParameters    = CR_CPU_MEMORY

From the doc, it seems that only CR_Memory implies OverSubscribe=YES :
All CR_s assume OverSubscribe=No or OverSubscribe=Force EXCEPT for CR_MEMORY which assumes OverSubscribe=Yes

When I do "scontrol list jobs", all jobs have OverSubscribe=OK (which is not Yes). Again from the docs it seems fine: "OK" otherwise (typically allocated dedicated CPUs)

Thanks again,

Stéphane

-----Message d'origine-----
De : slurm-users <slurm-users-bounces at lists.schedmd.com> De la part de Chris Samuel Envoyé : 17 avril 2018 04:29 À : slurm-users at lists.schedmd.com Objet : Re: [slurm-users] Node OverSubscribe even if set to no

On Tuesday, 17 April 2018 5:26:26 AM AEST Stéphane Larose wrote:

> So some jobs are now sharing the same cores but I don’t understand why 
> since OverSubscribe is set to no.

You might want to double check the config is acting as expected with:

scontrol show part | fgrep OverSubscribe

Also what does this say?

scontrol show config | fgrep SelectTypeParameters

I note that if you've got CR_Memory then:

                     CR_Memory
                            Memory  is  a  consumable  resource.   NOTE:  This
                            implies OverSubscribe=YES  or  OverSubscribe=FORCE
                            for  all  partitions.  Setting a value for DefMem‐
                            PerCPU is strongly recommended.

cheers,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




More information about the slurm-users mailing list