[slurm-users] Heterogeneous GPU Node?

Kamil Wilczek kmwil at mimuw.edu.pl
Thu Jun 23 20:40:39 UTC 2022


Hello,

we have both homogeneous and heterogeneous GPU servers and all of them
work without problems. We have mixed GTX 1080 Ti, Titan V and Titan X,
but not the more powerful cards (we have only few of them and they
are working in the same machine).

If the server has an adequate cooling, enough PCI-E lanes (I do not
have experience with NVLink) and power supply with enough power
connectors, you should not see any hardware related issues. The cards
should not have a "gaming" build with large fans on the flat side of the
GPU, but front-to-back airflow which will be consistent with the airflow
in the server.

 From the software point of view, the nVidia driver should support
both cards. In the Slurm configuration, I marked them as
different GRES:

# gres.conf

Name=gpu Type=1080ti File=/dev/nvidia0
Name=gpu Type=1080ti File=/dev/nvidia1
Name=gpu Type=titanv File=/dev/nvidia2
Name=gpu Type=titanv File=/dev/nvidia3
Name=gpu Type=titanv File=/dev/nvidia4
Name=gpu Type=titanv File=/dev/nvidia5
Name=gpu Type=titanv File=/dev/nvidia6
Name=gpu Type=titanv File=/dev/nvidia7

# slurm.conf

NodeName=... NodeAddr=... CPUs=40 Gres=gpu:1080ti:2,gpu:titanv:6 ...

I'm not aware of any side effects of that setup. If there are any,
I would also like to know about them :)

Kind Regards
-- 


W dniu 23.06.2022 o 21:50, Jason Simms pisze:
> Hello all,
> 
> Slightly OT, but I'm hoping the hive mind here can share some advice.
> 
> We have a GPU node with three RTX8000 GPUs installed. The node has a 
> capacity of 8 cards in total. I have a researcher who possibly wants to 
> add an A100. I recall asking our vendor a while back whether it's 
> possible (or advisable) to add that card to the existing node, which 
> would result in a heterogeneous mix of GPUs in a single system. They 
> indicated that it's not recommended to do so, but I'm wondering whether 
> anyone has direct experience with this.
> 
> And, apropos of this list, if it's fine to move forward with this, are 
> there any Slurm configuration issues I should be aware of?
> 
> Warmest regards,
> Jason
> 
> -- 
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632

-- 
Kamil Wilczek  [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 236 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220623/f9c06290/attachment.sig>


More information about the slurm-users mailing list