[slurm-users] Incorrect Number of GPUs?

Jason Simms simmsj at lafayette.edu
Mon Jul 26 16:04:51 UTC 2021


Hello all,

I have a GPU node with 3 identical GPUs (we started with two and recently
added the third). Running nvidia-smi correctly shows that all three are
recognized. My gres.conf file has only this line:

NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3

And the relevant lines in slurm.conf are:

NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3

As far as I can tell, all of this is fine (and we had no issues when we
only had the initial two GPUs in the system). However, now when I run sinfo
-o %G (which as I understand will report the total number of gres resources
available), this is the output:

GRES
(null)
gpu:quadro_8000:2

Is this saying that it doesn't recognize the third card? Any suggestions?
As always, thank you for your help!

Warmest regards,
Jason

-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210726/9a13bc6c/attachment.htm>


More information about the slurm-users mailing list