[slurm-users] Heterogeneous GPU Node MPS

Fri Nov 13 14:10:47 UTC 2020

From the NVIDIA docs re: MPS:

On systems with a mix of Volta / pre-Volta GPUs, if the MPS server is set to enumerate any Volta GPU, it will discard all pre-Volta GPUs. In other words, the MPS server will either operate only on the Volta GPUs and expose Volta capabilities, or operate only on pre-Volta GPUs.

I'd be curious what happens if you change the ordering (RTX then V100) in the gres.conf -- would the RTX work with MPS and the V100 would not?

> On Nov 13, 2020, at 07:23 , Holger Badorreck <h.badorreck at lzh.de> wrote:
> 
> Hello,
>  
> I have a heterogeneous GPU Node with one V100 and two RTX cards. When I request resources with --gres=mps:100, always the V100 is chosen, and jobs are waiting if the V100 is completely allocated, while RTX cards are free. If I use --gres=gpu:1, also the RTX cards are used. Is something wrong with the configuration or is it another problem?
>  
> The node configuration  in slurm.conf:
> NodeName=node1 CPUs=48 RealMemory=128530 Sockets=1 CoresPerSocket=24 ThreadsPerCore=2 Gres=gpu:v100:1,gpu:rtx:2,mps:600 State=UNKNOWN
>  
> gres.conf:
> Name=gpu Type=v100      File=/dev/nvidia0
> Name=gpu Type=rtx          File=/dev/nvidia1
> Name=gpu Type=rtx          File=/dev/nvidia2
> Name=mps Count=200      File=/dev/nvidia0
> Name=mps Count=200      File=/dev/nvidia1
> Name=mps Count=200      File=/dev/nvidia2
>  
> Best regards,
> Holger

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201113/d50c88fd/attachment.htm>