[slurm-users] gres/gpu count reported lower than configured
rug262 at psu.edu
Fri Oct 21 14:26:52 UTC 2022
I've encountered that many times, and for me, it was always related to AutoDetect and the nvidia-ml library. Does your slurmd log contain a line like "debug: skipping GRES for NodeName=t-gc-1202 AutoDetect=nvml"? I see that you didn't specifically set AutoDetect to nvml in gres.conf, but maybe you should set AutoDetect=off just to be sure.
If "sinfo" shows an "inval" node, then setting them to Resume (not Idle) won't work until you figure out why it thinks the node configuration is invalid.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users