[slurm-users] Features request

Fri Sep 25 11:57:57 UTC 2020

Thank you for your ideas Diego.

On 2020-09-25 02:20, Diego Zuccato wrote:
> Il 25/09/20 00:04, Relu Patrascu ha scritto:
>
>> 1. Allow preemption in the same QOS, all else being equal, based on job
>> priority.
> You'd risk having jobs continuously preempted by jobs that have been in
> queue for a bit: once a job starts, it stops accumulating priority ->
> another job preempts the first, sending it back in queue -> the first
> job accumulates some more priority and preempts the second -> loop !

We assume we can have a preemption exempt time of at least 1h, using

PreemptExemptTime       = 00:01:00

>
>> 2. Job size calculation to take into account the number of GPUs
>> allocated to the job. In a GPU cluster the most valuable currency being
>> the GPU, not the CPU. Perhaps even parameterize the job size so the user
>> could choose what to emphasize in calculation: cpu, gpu, memory.
> IIUC, you can already do that. See TRESBillingWeights option: just set
> the CPU and RAM to a low value relative to TRES/gpu.
>

I was referring to this

   https://slurm.schedmd.com/priority_multifactor.html#jobsize

which does not take into account TRESBillingWeights (which we do have 
configured accordingly to affect the user fairshare score). If jobsize 
could be affected by the number of GPUs used, slurm could make 
scheduling decisions more appropriate decisions for our GPU cluster, 
especially if the backfill scheduler is used.

Regards,

Relu