[slurm-users] Incorrect Number of GPUs?
Jason Simms
simmsj at lafayette.edu
Mon Jul 26 16:04:51 UTC 2021
Hello all,
I have a GPU node with 3 identical GPUs (we started with two and recently
added the third). Running nvidia-smi correctly shows that all three are
recognized. My gres.conf file has only this line:
NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3
And the relevant lines in slurm.conf are:
NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3
As far as I can tell, all of this is fine (and we had no issues when we
only had the initial two GPUs in the system). However, now when I run sinfo
-o %G (which as I understand will report the total number of gres resources
available), this is the output:
GRES
(null)
gpu:quadro_8000:2
Is this saying that it doesn't recognize the third card? Any suggestions?
As always, thank you for your help!
Warmest regards,
Jason
--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210726/9a13bc6c/attachment.htm>
More information about the slurm-users
mailing list