[slurm-users] Suspend without gang scheduling

Reed Dier reed.dier at focusvq.com
Mon Aug 8 16:27:43 UTC 2022

I’ve got essentially 3 “tiers” of jobs.

tier1 are stateless and can be requeued
tier2 are stateful and can be suspended
tier3 are “high priority” and can preempt tier1 and tier2 with the requisite preemption modes.

> $ sacctmgr show qos format=name%10,priority%10,preempt%12,preemptmode%10
>       Name   Priority      Preempt PreemptMod
> ---------- ---------- ------------ ----------
>     normal          0                 cluster
>      tier1         10                 requeue
>      tier2         10                 suspend
>      tier3        100  tier1,tier2    cluster

I also have a separate partition for the same hardware nodes to allow for tier3 to cross partitions to suspend tier2 (if its possible to have this all work in a single partition, please let me know).

tier1 and tier2 get preempted by tier3 perfectly, but the problem is now that tier3 gets gang scheduled in times of big queues in tier3, when I never want gang scheduling anywhere, but especially not tier3.

> PreemptType=preempt/qos
> PreemptMode=SUSPEND,GANG

This is what is in my slurm.conf, because if I try to set PreemptMode=SUSPEND, the ctld won’t start due to:
> slurmctld: error: PreemptMode=SUSPEND requires GANG too

I have also tried to set PreemptMode=OFF in the (tier3) partition as well, but this has had no effect on gang scheduling that I can see.

Right now, my hit-it-with-a-hammer solution is increasing SchedulerTimeSlice to 65535 that should effectively prevent jobs from gang scheduling.
While this effectively gets me to the goal I’m looking for, it's inelegant, and if I end up with jobs that go past ~18 hours, this is not going to work as I want/hope/expect.

So I’m hoping that there is a better solution to this that would solve the root issue to have the tier3 qos/partition not preempt itself.

Hopefully I’ve described this well enough and someone can offer some pointers on how to have suspend-able jobs in tier2, without having incidental gang-suspension in tier3.

This is 21.08.8-2 in the production cluster, and I’m testing 22.05.2 in my testing cluster which is behaving the same way.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220808/e6f62475/attachment.htm>

More information about the slurm-users mailing list