[slurm-users] Features request
Relu Patrascu
relu at cs.toronto.edu
Fri Sep 25 11:57:57 UTC 2020
Thank you for your ideas Diego.
On 2020-09-25 02:20, Diego Zuccato wrote:
> Il 25/09/20 00:04, Relu Patrascu ha scritto:
>
>> 1. Allow preemption in the same QOS, all else being equal, based on job
>> priority.
> You'd risk having jobs continuously preempted by jobs that have been in
> queue for a bit: once a job starts, it stops accumulating priority ->
> another job preempts the first, sending it back in queue -> the first
> job accumulates some more priority and preempts the second -> loop !
We assume we can have a preemption exempt time of at least 1h, using
PreemptExemptTime = 00:01:00
>
>> 2. Job size calculation to take into account the number of GPUs
>> allocated to the job. In a GPU cluster the most valuable currency being
>> the GPU, not the CPU. Perhaps even parameterize the job size so the user
>> could choose what to emphasize in calculation: cpu, gpu, memory.
> IIUC, you can already do that. See TRESBillingWeights option: just set
> the CPU and RAM to a low value relative to TRES/gpu.
>
I was referring to this
https://slurm.schedmd.com/priority_multifactor.html#jobsize
which does not take into account TRESBillingWeights (which we do have
configured accordingly to affect the user fairshare score). If jobsize
could be affected by the number of GPUs used, slurm could make
scheduling decisions more appropriate decisions for our GPU cluster,
especially if the backfill scheduler is used.
Regards,
Relu
More information about the slurm-users
mailing list