[slurm-users] How to allocate resource for jobs without causing GPU fragmentation?
Jaekyeom Kim
btapiz at gmail.com
Fri Jun 26 15:46:29 UTC 2020
Hi,
I'm running a GPU cluster, and I would like to know if there is a way to
allocate resource for jobs without causing GPU fragmentation.
Currently, I'm using
> SelectType=select/cons_res
>
> SelectTypeParameters=CR_Core,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
and over-subscribing of CPU cores is set.
Let's say there are nodes A and B, and each of nodes A and B has 4 GPUs and
40 CPU cores.
The problem is, if jobs 1 and 2 request 1 GPU and 30 CPU cores each, both
of nodes A and B are selected for those jobs, which prevents a future job
requiring 4 GPUs from running on any of the two nodes.
If I'm not wrong, a simple workaround might be not managing CPU cores via
Slurm (e.g. CR_Memory), but it comes with downsides.
Could someone suggest any select plugins/parameters that can prevent such
GPU fragmentation, please?
Best,
Jaekyeom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200627/b789b6ce/attachment.htm>
More information about the slurm-users
mailing list