[slurm-users] Is it possible to define multiple partitions for the same node, but each one having a different subset of GPUs?

Thu Apr 1 01:35:54 UTC 2021

Many thanks Brian and Jeffrey for your ideas,
Yes, at this moment I have all resources listed in the node's definition
line, and just one partition (see below)
Indeed this config would work, with the collaboration of users to not abuse
requesting all existing GPUs for their jobs.
But something that I still don't have 100% clear, will it allow multiple
jobs to run at the same time if these request different GPUs ?

## Nodes List
NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=2
RealMemory=1024000 State=UNKNOWN
Gres=gpu:a100:4,gpu:a100_20g:2,gpu:a100_10g:2,gpu:a100_5g:16 Feature=ht,gpu

## Partitions list
PartitionName=gpu MaxTime=INFINITE State=UP Nodes=nodeGPU01  Default=YES

On Wed, Mar 31, 2021 at 3:16 PM Sarlo, Jeffrey S <JSarlo at central.uh.edu>
wrote:

> I think when you define the node in your slurm.conf, you could specify the
> different types you have and the number in the node.  Then when the user
> submits the job, they could specify the number and type they want and that
> would all work in one partition.  I have never done it because our nodes
> have the same type in them.
>
>
>
> For example, we have V100 and P100 gpus and decided on the type names of
> volta and tesla
>
>
>
> GresTypes=gpu
>
> NodeName=compute-0-[36-43] Gres=gpu:tesla:2 Feature=gen9
>
> NodeName=compute-4-[0-3]   Gres=gpu:volta:8 Feature=gen9
>
>
>
> The user then just uses the SBATCH directive  --gpus=tesla:1  to request
> one P100 gpu.
>
>
>
> This is an example from  https://slurm.schedmd.com/slurm.conf.html
>
>
>
> (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_consume:4G")
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Cristóbal Navarro
> *Sent:* Wednesday, March 31, 2021 10:21 AM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [slurm-users] Is it possible to define multiple partitions for
> the same node, but each one having a different subset of GPUs?
>
>
>
> Hi Community,
>
> I was checking the documentation but could find clear information on what
> I am trying to do.
>
> Here at the university we have a large compute node with 3 classes of
> GPUs. Lets say the node's hostname is "gpuComputer", it is composed of:
>
>    - 4x large GPUs
>    - 4x medium GPUs (MIG devices)
>    - 16x small GPUs (Mig devices)
>
> Our plan is that we want to have one partition for each class of GPUs.
>
> So if a user chooses the "small" partition, it will only see up to 16x
> small GPUs, and would not interfere with other jobs running on the "medium"
> or "large" partitions.
>
>
>
> Can I create three partitions and specify the corresponding subset of GPUs
> for each one?
>
>
>
> If not, would NodeName and NodeHostname serve as an alternative way? i.e.,
> to specify the node three times with different NodeName, but all using the
> same Hostname=gpuComputer, and specifying the corresponding subset of
> "Gres" resources for each one. Then on each partition, to choose the
> corresponding NodeName.
>
>
>
> Any feedback or advice on the best way to accomplish this would be much
> appreciated.
>
> best regards
>
>
>
>
>
>
> --
>
> Cristóbal A. Navarro
>

-- 
Cristóbal A. Navarro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210331/8412c5c2/attachment.htm>