[slurm-users] Features request
Relu Patrascu
relu at cs.toronto.edu
Thu Sep 24 22:04:28 UTC 2020
Hello all,
We're mostly a GPU compute shop, and we've been happy with slurm for the
last three years, but we think slurm would benefit from the following
two features:
1. Allow preemption in the same QOS, all else being equal, based on job
priority.
2. Job size calculation to take into account the number of GPUs
allocated to the job. In a GPU cluster the most valuable currency being
the GPU, not the CPU. Perhaps even parameterize the job size so the user
could choose what to emphasize in calculation: cpu, gpu, memory.
If this is not the right place to ask for this, I would appreciate a
pointer in the right direction.
Justification:
It's pretty obvious why we'd like #2.
We want #1 because we believe it would allow for a more natural
maximization of the cluster usage. A user X could grab the whole cluster
if it's free, while another user Y, arriving later could get jobs in by
preempting some of the jobs of X. We're assuming the fairshare score of
user X will decrease as resources are consumed, and Y's jobs will have a
higher priority. We also assume that requeue, checkpoint and restart are
employed. We also think that this would make the system more fair in the
long term, essentially time slicing usage through preemption based on
priority.
Relu
More information about the slurm-users
mailing list