[slurm-users] preemptable queue
Paul Edmon
pedmon at cfa.harvard.edu
Fri Jan 12 14:20:10 UTC 2024
At least in the example you are showing you have PreemptType commented
out, which means it will return the default. PreemptMode Cancel should
work, I don't see anything in the documentation that indicates it
wouldn't. So I suspect you have a typo somewhere in your conf.
-Paul Edmon-
On 1/11/2024 6:01 PM, Davide DelVento wrote:
> I would like to add a preemptable queue to our cluster. Actually I
> already have. We simply want jobs submitted to that queue be preempted
> if there are no resources available for jobs in other (high priority)
> queues. Conceptually very simple, no conditionals, no choices, just
> what I wrote.
> However it does not work as desired.
>
> This is the relevant part:
>
> grep -i Preemp /opt/slurm/slurm.conf
> #PreemptType = preempt/partition_prio
> PartitionName=regular DefMemPerCPU=4580 Default=True Nodes=node[01-12]
> State=UP PreemptMode=off PriorityTier=200
> PartitionName=All DefMemPerCPU=4580 Nodes=node[01-36] State=UP
> PreemptMode=off PriorityTier=500
> PartitionName=lowpriority DefMemPerCPU=4580 Nodes=node[01-36] State=UP
> PreemptMode=cancel PriorityTier=100
>
>
> That PreemptType setting (now commented) fully breaks slurm,
> everything refuses to run with errors like
>
> $ squeue
> squeue: error: PreemptType and PreemptMode values incompatible
> squeue: fatal: Unable to process configuration file
>
> If I understand correctly the documentation at
> https://slurm.schedmd.com/preempt.html that is because preemption
> cannot cancel jobs based on partition priority, which (if true) is
> really unfortunate. I understand that allowing
> cross-partition time-slicing could be tricky and so I understand why
> that isn't allowed, but cancelling? Anyway, I have to questions:
>
> 1) is that correct and so should I avoid using either partition
> priority or cancelling?
> 2) is there an easy way to trick slurm into requeing and then have
> those jobs cancelled instead?
> 3) I guess the cleanest option would be to implement QoS, but I've
> never done it and we don't really need it for anything else other than
> this. The documentation looks complicated, but is it? The great Ole's
> website is unavailable at the moment...
>
> Thanks!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240112/2d2c1455/attachment.htm>
More information about the slurm-users
mailing list