[slurm-users] preemptable queue

Davide DelVento davide.quantum at gmail.com
Fri Jan 12 15:33:54 UTC 2024


Thanks Paul,

I don't understand what you mean by having a typo somewhere. I mean, that
configuration works just fine right now, whereas if I add the commented out
line any slurm command will just abort with the error "PreemptType and
PreemptMode values incompatible". So, assuming there is a typo, it should
be in the commented line right? Or are you saying that having that line
makes slurm sensitive to a typo somewhere else that would be otherwise
ignored? Obviously I can't exclude that option, but it seems unlikely to
me. Also because it does say these two things are incompatible.

It would obviously much better if the error would say what EXACTLY is
incompatible with what, but the documentation at
https://slurm.schedmd.com/preempt.html I see many clues of what that could
be, and hence I am asking people here who may have deployed preemption
already on their system. Some excerpts from that URL:


*PreemptType*: Specifies the plugin used to identify which jobs can be
preempted in order to start a pending job.

   - *preempt/none*: Job preemption is disabled (default).
   - *preempt/partition_prio*: Job preemption is based upon partition
   *PriorityTier*. Jobs in higher PriorityTier partitions may preempt jobs
   from lower PriorityTier partitions. This is not compatible with
   *PreemptMode=OFF*.


which somewhat make it sounds like all partitions should have preemption
set and not only some? I obviously have some "off" partitions. However
elsewhere in that document it says

*PreemptMode*: Mechanism used to preempt jobs or enable gang scheduling.
When the *PreemptType* parameter is set to enable preemption, the
*PreemptMode* in the main section of slurm.conf selects the default
mechanism used to preempt the preemptable jobs for the cluster.
*PreemptMode* may be specified on a per partition basis to override this
default value if *PreemptType=preempt/partition_prio*.

which kind of sounds like it should be okay (unless it means **everything**
must be different than OFF). Yet still elsewhere in that same page it says

On the other hand, if you want to use *PreemptType=preempt/partition_prio* to
allow jobs from higher PriorityTier partitions to Suspend jobs from lower
PriorityTier partitions, then you will need overlapping partitions, and
*PreemptMode=SUSPEND,GANG* to use Gang scheduler to resume the suspended
job(s). In either case, time-slicing won't happen between jobs on different
partitions.

Which somewhat sounds like only suspend and gang can be used as preemption
modes, and not cancel (my preference) or requeue (perhaps acceptable, if I
jump through some hoops).

So to me the documentation is highly confusing about what can or cannot be
used together with what else, and the examples at the bottom of the page
are nice, but they do not specify the full settings. Particularly this one
https://slurm.schedmd.com/preempt.html#example2 is close enough to mine,
but it does not tell what PreemptType has been chosen (nor if "cancel"
would be allowed or not in that setup).

Thanks again!

On Fri, Jan 12, 2024 at 7:22 AM Paul Edmon <pedmon at cfa.harvard.edu> wrote:

> At least in the example you are showing you have PreemptType commented
> out, which means it will return the default. PreemptMode Cancel should
> work, I don't see anything in the documentation that indicates it
> wouldn't.  So I suspect you have a typo somewhere in your conf.
>
> -Paul Edmon-
> On 1/11/2024 6:01 PM, Davide DelVento wrote:
>
> I would like to add a preemptable queue to our cluster. Actually I already
> have. We simply want jobs submitted to that queue be preempted if there are
> no resources available for jobs in other (high priority) queues.
> Conceptually very simple, no conditionals, no choices, just what I wrote.
> However it does not work as desired.
>
> This is the relevant part:
>
> grep -i Preemp /opt/slurm/slurm.conf
> #PreemptType = preempt/partition_prio
> PartitionName=regular DefMemPerCPU=4580 Default=True Nodes=node[01-12]
> State=UP PreemptMode=off PriorityTier=200
> PartitionName=All DefMemPerCPU=4580 Nodes=node[01-36] State=UP
> PreemptMode=off PriorityTier=500
> PartitionName=lowpriority DefMemPerCPU=4580 Nodes=node[01-36] State=UP
> PreemptMode=cancel PriorityTier=100
>
>
> That PreemptType setting (now commented) fully breaks slurm, everything
> refuses to run with errors like
>
> $ squeue
> squeue: error: PreemptType and PreemptMode values incompatible
> squeue: fatal: Unable to process configuration file
>
> If I understand correctly the documentation at
> https://slurm.schedmd.com/preempt.html that is because preemption cannot
> cancel jobs based on partition priority, which (if true) is really
> unfortunate. I understand that allowing cross-partition time-slicing could
> be tricky and so I understand why that isn't allowed, but cancelling?
> Anyway, I have to questions:
>
> 1) is that correct and so should I avoid using either partition priority
> or cancelling?
> 2) is there an easy way to trick slurm into requeing and then have those
> jobs cancelled instead?
> 3) I guess the cleanest option would be to implement QoS, but I've never
> done it and we don't really need it for anything else other than this. The
> documentation looks complicated, but is it? The great Ole's website is
> unavailable at the moment...
>
> Thanks!!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240112/0c6e4f2c/attachment-0001.htm>


More information about the slurm-users mailing list