[slurm-users] Suspend QOS help

Walls, Mitchell miwalls at siue.edu
Fri Feb 18 15:20:16 UTC 2022


Hello,

Hoping someone can shed some light on what is causing jobs to run on same nodes simultaneously rather than being actually suspended for the lower priority job? I can provide more info if someone can think of something to help!

# Relevant config.
PreemptType=preempt/qos
PreemptMode=SUSPEND,GANG

PartitionName=general Default=YES Nodes=general     OverSubscribe=FORCE:1 MaxTime=30-00:00:00   Qos=general  AllowQos=general
PartitionName=suspend Default=NO  Nodes=general     OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=suspend AllowQos=suspend

# Qoses
      Name   Priority    Preempt PreemptMode 
---------- ---------- ---------- -----------
   general       1000     suspend     cluster
   suspend       100                        cluster

# squeue (another note is I see that both processes are actually running at same time and not being timesliced in htop)
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             45085   general  stress.s   user2   R       7:33     2 node[04-05]
             45084   suspend stress-s  user1   R       7:40     2 node[04-05]

Thanks!


More information about the slurm-users mailing list