[slurm-users] Incorrect Number of GPUs?

Fulcomer, Samuel samuel_fulcomer at brown.edu
Mon Jul 26 16:53:40 UTC 2021


...and... you need to restart slurmctld when you change a NodeName line.
"scontrol reconfigure" doesn't do the truck.

On Mon, Jul 26, 2021 at 12:49 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>
wrote:

> If you have a dual-root PCIe system you may need to specify the CPU/core
> affinity in gres.conf.
>
> On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <simmsj at lafayette.edu> wrote:
>
>> Hello all,
>>
>> I have a GPU node with 3 identical GPUs (we started with two and recently
>> added the third). Running nvidia-smi correctly shows that all three are
>> recognized. My gres.conf file has only this line:
>>
>> NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3
>>
>> And the relevant lines in slurm.conf are:
>>
>> NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
>> RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3
>>
>> As far as I can tell, all of this is fine (and we had no issues when we
>> only had the initial two GPUs in the system). However, now when I run sinfo
>> -o %G (which as I understand will report the total number of gres
>> resources available), this is the output:
>>
>> GRES
>> (null)
>> gpu:quadro_8000:2
>>
>> Is this saying that it doesn't recognize the third card? Any suggestions?
>> As always, thank you for your help!
>>
>> Warmest regards,
>> Jason
>>
>> --
>> *Jason L. Simms, Ph.D., M.P.H.*
>> Manager of Research and High-Performance Computing
>> XSEDE Campus Champion
>> Lafayette College
>> Information Technology Services
>> 710 Sullivan Rd | Easton, PA 18042
>> Office: 112 Skillman Library
>> p: (610) 330-5632
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210726/da3ed54e/attachment.htm>


More information about the slurm-users mailing list