[slurm-users] Incorrect Number of GPUs?

Fulcomer, Samuel samuel_fulcomer at brown.edu
Mon Jul 26 16:49:20 UTC 2021


If you have a dual-root PCIe system you may need to specify the CPU/core
affinity in gres.conf.

On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <simmsj at lafayette.edu> wrote:

> Hello all,
>
> I have a GPU node with 3 identical GPUs (we started with two and recently
> added the third). Running nvidia-smi correctly shows that all three are
> recognized. My gres.conf file has only this line:
>
> NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3
>
> And the relevant lines in slurm.conf are:
>
> NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
> RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3
>
> As far as I can tell, all of this is fine (and we had no issues when we
> only had the initial two GPUs in the system). However, now when I run sinfo
> -o %G (which as I understand will report the total number of gres
> resources available), this is the output:
>
> GRES
> (null)
> gpu:quadro_8000:2
>
> Is this saying that it doesn't recognize the third card? Any suggestions?
> As always, thank you for your help!
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210726/376bd26e/attachment-0001.htm>


More information about the slurm-users mailing list