[slurm-users] Priority access for a group of users

Mark Hahn hahn at mcmaster.ca
Fri Mar 1 20:46:26 UTC 2019

> I'm a fan of the suspend mode myself but that is dependent on users not
> asking for all the ram by default. If you can educate the users then this
> works really well as the low priority job stays in ram in suspended mode
> while the high priority job completes and then the low priority job
> continues from where it stopped. No checkpoints and no killing.

Me too - in fact, I'm not afraid of swap space, so don't mind if a suspended
job gets swapped out (it won't thrash).

It's incredibly useful to support "debug" jobs, which are expected to run 
only briefly, but for which someone is waiting.

In most of the previous schedulers we've used (which shall not be named),
we often ran into problems with this: once the victim was suspended, the 
scheduler would start other jobs, not just the preemptor, on the resources
made available - sometimes making it impossible to resume the victim,
depending on the mixture of new jobs/sizes/priorities.

In principle, the fix for this would be either to only permit the single 
preemptor onto the victim's resources, or at least to backfill only into 
the bubble caused by the preemptor (no more than an hour).

I would be interested to know whether other Slurm sites do this successfully,
particularly in avoiding the victim-stays-suspended priority inversion.

Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
           | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140 x24687
           | Compute/Calcul Canada                | http://www.computecanada.ca

More information about the slurm-users mailing list