[slurm-users] Enforcing -c and -t for fairshare scheduling and other setting

r nbs.public at gmail.com
Fri May 13 14:54:15 UTC 2022


We've deployed a Slurm cluster and it works well. However, I would like to
encourage users to conserve resources and to distribute jobs more fairly.

Below are some ideas I'd like to implement, please let me know if they are
feasible and, if so, point me in a correct direction. Or let me know if
there are better ways of achieving the above goal.

I would like to:
- Require users to specify -c and -t options. That is, to reject any jobs
that do not specify these options. Optionally also --mem but that is of low
priority to us.
- Forbid use of --cpu-bind=no or treat it as -c 64.
- Set up a fairshare scheduler and assign weight to values specified via -c
and -t
- Enforce resource limits specified via -c, -t and -mem (-t and -c already
work, at least without --cpu-bind=no)
- Either limit the overall number of CPU slots per partition or test for
availability of licences before jobs are released from the queue. This is
to prevent jobs from waiting for licenses at run time and potentially get
killed when -t timeout is exceeded.
- Ideally, force jobs to queue for a certain period of time (a small
fraction of -c * -t) even if partition has available resources left. This
is to prevent large jobs from being submitted and dispatched ahead of
smaller jobs, and to further reward conserving resources.

Many thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220513/9e3d8351/attachment.htm>

More information about the slurm-users mailing list