[slurm-users] Is it possible to define multiple partitions for the same node, but each one having a different subset of GPUs?
Brian Andrus
toomuchit at gmail.com
Wed Mar 31 17:46:42 UTC 2021
So the node definition is separate from the partition definition.
You would need to define all the GPUs as part of the node. Partitions do
not have physical characteristics, but they do have QOS capabilities
that you may be able to use. You could also use a job_submit lua script
to reject jobs that request resources you do not want used in a
particular queue.
Both would take some research to find the best approach, but I think
those are the two options available that may do what you are looking for.
Brian Andrus
On 3/31/2021 8:21 AM, Cristóbal Navarro wrote:
> Hi Community,
> I was checking the documentation but could find clear information on
> what I am trying to do.
> Here at the university we have a large compute node with 3 classes of
> GPUs. Lets say the node's hostname is "gpuComputer", it is composed of:
>
> * 4x large GPUs
> * 4x medium GPUs (MIG devices)
> * 16x small GPUs (Mig devices)
>
> Our plan is that we want to have one partition for each class of GPUs.
> So if a user chooses the "small" partition, it will only see up to 16x
> small GPUs, and would not interfere with other jobs running on the
> "medium" or "large" partitions.
>
> Can I create three partitions and specify the corresponding subset of
> GPUs for each one?
>
> If not, would NodeName and NodeHostname serve as an alternative way?
> i.e., to specify the node three times with different NodeName, but all
> using the same Hostname=gpuComputer, and specifying the corresponding
> subset of "Gres" resources for each one. Then on each partition, to
> choose the corresponding NodeName.
>
> Any feedback or advice on the best way to accomplish this would be
> much appreciated.
> best regards
>
>
>
> --
> Cristóbal A. Navarro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210331/80a4e5ee/attachment.htm>
More information about the slurm-users
mailing list