[slurm-users] gres/gpu count reported lower than configured

Geleßus, Achim A.Gelessus at jacobs-university.de
Fri Oct 21 16:17:00 UTC 2022


Yes, you are right. AutoDetect=off in the gres.conf file solved the
problem! Thank you very much!!


Best wishes

Achim

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Groner, Rob <rug262 at psu.edu>
Sent: Friday, October 21, 2022 16:26
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] gres/gpu count reported lower than configured

I've encountered that many times, and for me, it was always related to AutoDetect and the nvidia-ml library.  Does your slurmd log contain a line like "debug:  skipping GRES for NodeName=t-gc-1202  AutoDetect=nvml"?  I see that you didn't specifically set AutoDetect to nvml in gres.conf, but maybe you should set AutoDetect=off just to be sure.

If "sinfo" shows an "inval" node, then setting them to Resume (not Idle) won't work until you figure out why it thinks the node configuration is invalid.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221021/d0ea7c12/attachment.htm>


More information about the slurm-users mailing list