[slurm-users] Incorrect Number of GPUs?

Fulcomer, Samuel samuel_fulcomer at brown.edu
Mon Jul 26 17:39:42 UTC 2021


Yeah, you'd think after all this time it would, bu it remains a bit of
arcane knowledge that's mostly passed on in oral history....

There are some things that the slurmd processes need to be restarted for,
as well. I have a vague memory that changing the debug level is one...

On Mon, Jul 26, 2021 at 1:32 PM Jason Simms <simmsj at lafayette.edu> wrote:

> Dear Samuel,
>
> Restarting slurmctld did the trick. Thanks! I should have thought to do
> that, but typically sconrtrol reconfigure picks up most changes.
>
> Warmest regards,
> Jason
>
> On Mon, Jul 26, 2021 at 12:55 PM Fulcomer, Samuel <
> samuel_fulcomer at brown.edu> wrote:
>
>> ...and... you need to restart slurmctld when you change a NodeName line.
>> "scontrol reconfigure" doesn't do the truck.
>>
>> On Mon, Jul 26, 2021 at 12:49 PM Fulcomer, Samuel <
>> samuel_fulcomer at brown.edu> wrote:
>>
>>> If you have a dual-root PCIe system you may need to specify the CPU/core
>>> affinity in gres.conf.
>>>
>>> On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <simmsj at lafayette.edu>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I have a GPU node with 3 identical GPUs (we started with two and
>>>> recently added the third). Running nvidia-smi correctly shows that all
>>>> three are recognized. My gres.conf file has only this line:
>>>>
>>>> NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3
>>>>
>>>> And the relevant lines in slurm.conf are:
>>>>
>>>> NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
>>>> RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3
>>>>
>>>> As far as I can tell, all of this is fine (and we had no issues when we
>>>> only had the initial two GPUs in the system). However, now when I run sinfo
>>>> -o %G (which as I understand will report the total number of gres
>>>> resources available), this is the output:
>>>>
>>>> GRES
>>>> (null)
>>>> gpu:quadro_8000:2
>>>>
>>>> Is this saying that it doesn't recognize the third card? Any
>>>> suggestions? As always, thank you for your help!
>>>>
>>>> Warmest regards,
>>>> Jason
>>>>
>>>> --
>>>> *Jason L. Simms, Ph.D., M.P.H.*
>>>> Manager of Research and High-Performance Computing
>>>> XSEDE Campus Champion
>>>> Lafayette College
>>>> Information Technology Services
>>>> 710 Sullivan Rd | Easton, PA 18042
>>>> Office: 112 Skillman Library
>>>> p: (610) 330-5632
>>>>
>>>
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210726/4c63c2f9/attachment-0001.htm>


More information about the slurm-users mailing list