[slurm-users] Incorrect Number of GPUs?

Jason Simms simmsj at lafayette.edu
Mon Jul 26 17:29:45 UTC 2021


Dear Samuel,

Restarting slurmctld did the trick. Thanks! I should have thought to do
that, but typically sconrtrol reconfigure picks up most changes.

Warmest regards,
Jason

On Mon, Jul 26, 2021 at 12:55 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>
wrote:

> ...and... you need to restart slurmctld when you change a NodeName line.
> "scontrol reconfigure" doesn't do the truck.
>
> On Mon, Jul 26, 2021 at 12:49 PM Fulcomer, Samuel <
> samuel_fulcomer at brown.edu> wrote:
>
>> If you have a dual-root PCIe system you may need to specify the CPU/core
>> affinity in gres.conf.
>>
>> On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <simmsj at lafayette.edu>
>> wrote:
>>
>>> Hello all,
>>>
>>> I have a GPU node with 3 identical GPUs (we started with two and
>>> recently added the third). Running nvidia-smi correctly shows that all
>>> three are recognized. My gres.conf file has only this line:
>>>
>>> NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3
>>>
>>> And the relevant lines in slurm.conf are:
>>>
>>> NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
>>> RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3
>>>
>>> As far as I can tell, all of this is fine (and we had no issues when we
>>> only had the initial two GPUs in the system). However, now when I run sinfo
>>> -o %G (which as I understand will report the total number of gres
>>> resources available), this is the output:
>>>
>>> GRES
>>> (null)
>>> gpu:quadro_8000:2
>>>
>>> Is this saying that it doesn't recognize the third card? Any
>>> suggestions? As always, thank you for your help!
>>>
>>> Warmest regards,
>>> Jason
>>>
>>> --
>>> *Jason L. Simms, Ph.D., M.P.H.*
>>> Manager of Research and High-Performance Computing
>>> XSEDE Campus Champion
>>> Lafayette College
>>> Information Technology Services
>>> 710 Sullivan Rd | Easton, PA 18042
>>> Office: 112 Skillman Library
>>> p: (610) 330-5632
>>>
>>

-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210726/ea018069/attachment.htm>


More information about the slurm-users mailing list