[slurm-users] Suspend QOS help

Brian Andrus toomuchit at gmail.com
Fri Feb 18 15:36:55 UTC 2022


First look and I would guess that there are enough resources to satisfy 
the requests of both jobs, so no need to suspend.

Having the node info and the job info to compare would be the next step.

Brian Andrus


On 2/18/2022 7:20 AM, Walls, Mitchell wrote:
> Hello,
>
> Hoping someone can shed some light on what is causing jobs to run on same nodes simultaneously rather than being actually suspended for the lower priority job? I can provide more info if someone can think of something to help!
>
> # Relevant config.
> PreemptType=preempt/qos
> PreemptMode=SUSPEND,GANG
>
> PartitionName=general Default=YES Nodes=general     OverSubscribe=FORCE:1 MaxTime=30-00:00:00   Qos=general  AllowQos=general
> PartitionName=suspend Default=NO  Nodes=general     OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=suspend AllowQos=suspend
>
> # Qoses
>        Name   Priority    Preempt PreemptMode
> ---------- ---------- ---------- -----------
>     general       1000     suspend     cluster
>     suspend       100                        cluster
>
> # squeue (another note is I see that both processes are actually running at same time and not being timesliced in htop)
> $ squeue
>               JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>               45085   general  stress.s   user2   R       7:33     2 node[04-05]
>               45084   suspend stress-s  user1   R       7:40     2 node[04-05]
>
> Thanks!



More information about the slurm-users mailing list