rug262 at psu.edu
Tue Mar 7 14:46:57 UTC 2023
I found a thread about this topic that's a year old and at that time seemed to give no hope, I'm just wondering if the situation has changed. My testing so far isn't encouraging.
In the thread (here: https://groups.google.com/g/slurm-users/c/yhnSVBoohik) it talks about wanting to give lower priority jobs some amount of guaranteed run time. That's what we're trying to do.
Over global settings are PreemptMode=SUSPEND,GANG and PreemptType=preempt/partition_prio. We have a high priority partition that nothing should ever preempt, and an open partition that is always preemptable. In between is a burst partition. It can be preempted if the high priority partition needs the resources. That's the partition we'd like to guarantee a 1 hour run time on. Looking at the sacctmgr man page, it gives this info on QOS:
Specifies a minimum run time for jobs of this QOS before they are considered for preemption. This QOS option takes precedence over the global PreemptExemptTime. This is only honored for PreemptMode=REQUEUE and PreemptMode=CANCEL.
This sounds like exactly what we want. So I went into the burst QOS we have available on the burst partition and I set a preemptExemptTime of 30 seconds and a preemptMode of cancel, and tested. Whenever something of a higher priority came along, my job was immediately cancelled, no exempt time was utliized.
Am I not understanding how this is supposed to work, or am I asking for an impossible slurm configuration?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users