[slurm-users] Incorrect Number of GPUs?
simmsj at lafayette.edu
Mon Jul 26 17:29:45 UTC 2021
Restarting slurmctld did the trick. Thanks! I should have thought to do
that, but typically sconrtrol reconfigure picks up most changes.
On Mon, Jul 26, 2021 at 12:55 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>
> ...and... you need to restart slurmctld when you change a NodeName line.
> "scontrol reconfigure" doesn't do the truck.
> On Mon, Jul 26, 2021 at 12:49 PM Fulcomer, Samuel <
> samuel_fulcomer at brown.edu> wrote:
>> If you have a dual-root PCIe system you may need to specify the CPU/core
>> affinity in gres.conf.
>> On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <simmsj at lafayette.edu>
>>> Hello all,
>>> I have a GPU node with 3 identical GPUs (we started with two and
>>> recently added the third). Running nvidia-smi correctly shows that all
>>> three are recognized. My gres.conf file has only this line:
>>> NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3
>>> And the relevant lines in slurm.conf are:
>>> NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1
>>> RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3
>>> As far as I can tell, all of this is fine (and we had no issues when we
>>> only had the initial two GPUs in the system). However, now when I run sinfo
>>> -o %G (which as I understand will report the total number of gres
>>> resources available), this is the output:
>>> Is this saying that it doesn't recognize the third card? Any
>>> suggestions? As always, thank you for your help!
>>> Warmest regards,
>>> *Jason L. Simms, Ph.D., M.P.H.*
>>> Manager of Research and High-Performance Computing
>>> XSEDE Campus Champion
>>> Lafayette College
>>> Information Technology Services
>>> 710 Sullivan Rd | Easton, PA 18042
>>> Office: 112 Skillman Library
>>> p: (610) 330-5632
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users