[slurm-users] preemptable queue

Davide DelVento davide.quantum at gmail.com
Fri Jan 12 16:20:25 UTC 2024


Thanks Paul for taking the time to further look into this. In fact you are
correct and adding a default mode (which is then overridden by each
partition setting) keeps slurm happy with that configuration. Moreover
(after restarting daemons, etc per the documentation) everything seems to
be working as I intended. I obviously need to do a few more tests,
especially for edge cases, but adding that default seems to have completely
fixed the problem.

Thanks again and have a great weekend!


On Fri, Jan 12, 2024 at 8:49 AM Paul Edmon <pedmon at cfa.harvard.edu> wrote:

> My concern was you config inadvertantly having that line commented out and
> then seeing problems. If it wasn't then no worries at this point.
>
> We run using preempt/partition_prio on our cluster and have a mix of
> partitions using PreemptMode=OFF and PreemptMode=REQUEUE. So I know that
> combination works. I would be surprised if PreemptMode=CANCEL did not work
> as that's a valid option.
>
> Something we do have set though is what the default mode is. We have set:
>
> ### Govern's default preemption behavior
> PreemptType=preempt/partition_prio
> PreemptMode=REQUEUE
>
> So you might try setting that default of PreemptMode=CANCEL and then set
> specific PreemptModes for all your partitions. That's what we do and it
> works for us.
>
> -Paul Edmon-
> On 1/12/2024 10:33 AM, Davide DelVento wrote:
>
> Thanks Paul,
>
> I don't understand what you mean by having a typo somewhere. I mean, that
> configuration works just fine right now, whereas if I add the commented out
> line any slurm command will just abort with the error "PreemptType and
> PreemptMode values incompatible". So, assuming there is a typo, it should
> be in the commented line right? Or are you saying that having that line
> makes slurm sensitive to a typo somewhere else that would be otherwise
> ignored? Obviously I can't exclude that option, but it seems unlikely to
> me. Also because it does say these two things are incompatible.
>
> It would obviously much better if the error would say what EXACTLY is
> incompatible with what, but the documentation at
> https://slurm.schedmd.com/preempt.html I see many clues of what that
> could be, and hence I am asking people here who may have deployed
> preemption already on their system. Some excerpts from that URL:
>
>
> *PreemptType*: Specifies the plugin used to identify which jobs can be
> preempted in order to start a pending job.
>
>    - *preempt/none*: Job preemption is disabled (default).
>    - *preempt/partition_prio*: Job preemption is based upon partition
>    *PriorityTier*. Jobs in higher PriorityTier partitions may preempt
>    jobs from lower PriorityTier partitions. This is not compatible with
>    *PreemptMode=OFF*.
>
>
> which somewhat make it sounds like all partitions should have preemption
> set and not only some? I obviously have some "off" partitions. However
> elsewhere in that document it says
>
> *PreemptMode*: Mechanism used to preempt jobs or enable gang scheduling.
> When the *PreemptType* parameter is set to enable preemption, the
> *PreemptMode* in the main section of slurm.conf selects the default
> mechanism used to preempt the preemptable jobs for the cluster.
> *PreemptMode* may be specified on a per partition basis to override this
> default value if *PreemptType=preempt/partition_prio*.
>
> which kind of sounds like it should be okay (unless it means
> **everything** must be different than OFF). Yet still elsewhere in that
> same page it says
>
> On the other hand, if you want to use *PreemptType=preempt/partition_prio* to
> allow jobs from higher PriorityTier partitions to Suspend jobs from lower
> PriorityTier partitions, then you will need overlapping partitions, and
> *PreemptMode=SUSPEND,GANG* to use Gang scheduler to resume the suspended
> job(s). In either case, time-slicing won't happen between jobs on different
> partitions.
>
> Which somewhat sounds like only suspend and gang can be used as preemption
> modes, and not cancel (my preference) or requeue (perhaps acceptable, if I
> jump through some hoops).
>
> So to me the documentation is highly confusing about what can or cannot be
> used together with what else, and the examples at the bottom of the page
> are nice, but they do not specify the full settings. Particularly this one
> https://slurm.schedmd.com/preempt.html#example2 is close enough to mine,
> but it does not tell what PreemptType has been chosen (nor if "cancel"
> would be allowed or not in that setup).
>
> Thanks again!
>
> On Fri, Jan 12, 2024 at 7:22 AM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>
>> At least in the example you are showing you have PreemptType commented
>> out, which means it will return the default. PreemptMode Cancel should
>> work, I don't see anything in the documentation that indicates it
>> wouldn't.  So I suspect you have a typo somewhere in your conf.
>>
>> -Paul Edmon-
>> On 1/11/2024 6:01 PM, Davide DelVento wrote:
>>
>> I would like to add a preemptable queue to our cluster. Actually I
>> already have. We simply want jobs submitted to that queue be preempted if
>> there are no resources available for jobs in other (high priority) queues.
>> Conceptually very simple, no conditionals, no choices, just what I wrote.
>> However it does not work as desired.
>>
>> This is the relevant part:
>>
>> grep -i Preemp /opt/slurm/slurm.conf
>> #PreemptType = preempt/partition_prio
>> PartitionName=regular DefMemPerCPU=4580 Default=True Nodes=node[01-12]
>> State=UP PreemptMode=off PriorityTier=200
>> PartitionName=All DefMemPerCPU=4580 Nodes=node[01-36] State=UP
>> PreemptMode=off PriorityTier=500
>> PartitionName=lowpriority DefMemPerCPU=4580 Nodes=node[01-36] State=UP
>> PreemptMode=cancel PriorityTier=100
>>
>>
>> That PreemptType setting (now commented) fully breaks slurm, everything
>> refuses to run with errors like
>>
>> $ squeue
>> squeue: error: PreemptType and PreemptMode values incompatible
>> squeue: fatal: Unable to process configuration file
>>
>> If I understand correctly the documentation at
>> https://slurm.schedmd.com/preempt.html that is because preemption cannot
>> cancel jobs based on partition priority, which (if true) is really
>> unfortunate. I understand that allowing cross-partition time-slicing could
>> be tricky and so I understand why that isn't allowed, but cancelling?
>> Anyway, I have to questions:
>>
>> 1) is that correct and so should I avoid using either partition priority
>> or cancelling?
>> 2) is there an easy way to trick slurm into requeing and then have those
>> jobs cancelled instead?
>> 3) I guess the cleanest option would be to implement QoS, but I've never
>> done it and we don't really need it for anything else other than this. The
>> documentation looks complicated, but is it? The great Ole's website is
>> unavailable at the moment...
>>
>> Thanks!!
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240112/c5b8cbca/attachment-0001.htm>


More information about the slurm-users mailing list