[slurm-users] trying to configure preemption partitions and also non-preemption with OverSubcribe=FORCE

Kevin Broch kbroch at rivosinc.com
Thu Jun 15 01:22:44 UTC 2023


The general idea is to have priority batch partitions with preemptions that
can
occur for higher priority jobs (suspending the lower priority).
Also there's an interactive partition where users can run GUI tools that
can't be preempted.

This works fine up to the point that I would like to OverSubscribe=FORCE:2
on the interactive partition.
Instead of seeing this do what I would hope, which is see 2x the number of
single CPU jobs run on the
interactive partition, the next job after 1x CPUs are allocated pends.

Is it possible to have preemption turned on in general and still get
OverSubscribe work the way it works w/o preemption on a partition with
PreemptMode=OFF?
If so I must be missing something in my configuration (see below).  If not,
why?

Below is the details of my setup:

kbroch at slm-dev.ba.rivosinc.com:~ via 
✦2 ❯ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
low*           up 14-00:00:0      2   idle cs44,cs1-dev
medium         up 14-00:00:0      2   idle cs44,cs1-dev
high           up 14-00:00:0      2   idle cs44,cs1-dev
interactive    up 14-00:00:0      1   idle cs2-dev

kbroch at slm-dev.ba.rivosinc.com:~ via 
✦2 ❯ scontrol show partition interactive
PartitionName=interactive
   AllowGroups=ALL AllowAccounts=rvs,gd1-dv AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
   MaxNodes=UNLIMITED MaxTime=14-00:00:00 MinNodes=0 LLN=NO
MaxCPUsPerNode=UNLIMITED
   Nodes=cs2-dev
   PriorityJobFactor=1 PriorityTier=100 RootOnly=NO ReqResv=NO
OverSubscribe=FORCE:2
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=2 TotalNodes=1 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=400 MaxMemPerNode=UNLIMITED


kbroch at slm-dev.ba.rivosinc.com:~ via 
✦2 ❯ scontrol show config | grep Preempt
PreemptMode             = GANG,SUSPEND
PreemptType             = preempt/partition_prio
PreemptExemptTime       = 00:00:00

kbroch at slm-dev.ba.rivosinc.com:~ via 
✦2 ❯ srun -p interactive sleep 600 &
[5] 60490

kbroch at slm-dev.ba.rivosinc.com:~ via 
✦3 ❯ srun -p interactive sleep 600 &
[6] 60613

kbroch at slm-dev.ba.rivosinc.com:~ via 
✦4 ❯ srun -p interactive sleep 600 &
[7] 60696
srun: job 18919 queued and waiting for resources

kbroch at slm-dev.ba.rivosinc.com:~ via 
✦5 ❯ sq
             JOBID PARTITIO                 NAME             USER ST
        TIME  NODES CPU MIN_MEMO NODELIST(REASON)
             18919 interact                sleep           kbroch PD
        0:00      1   1     400M (Resources)
             18917 interact                sleep           kbroch  R
        0:04      1   1     400M cs2-dev
             18918 interact                sleep           kbroch  R
        0:04      1   1     400M cs2-dev

Best, /<evin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230614/87eab8db/attachment-0001.htm>


More information about the slurm-users mailing list