On 11/13/25 2:16 pm, Lee via slurm-users wrote:
- When I look at our 8 non-MIG DGXs, via `scontrol show node=dgxXY |
grep Gres`, 7/8 DGXs report "Gres=gpu:*H100*:8(S:0-1)" while dgx09 reports "Gres=gpu:*h100*:8(S:0-1)"
Two thoughts:
1) Looking at the 24.11 code when it's using NVML to get the names everything gets lowercased - so I wonder if these new ones are getting correctly discovered by NVML but the older ones are not and so using the uppercase values in your config?
gpu_common_underscorify_tolower(device_name);
I would suggest making sure the GPU names are lower-cased everywhere for consistency.
2) From memory (away from work at the moment) slurmd caches hwloc library information in an XML file - you might want to go and find that on an older and newer node and compare those to see if you see the same difference there. It could be interesting to see if you stop slurmd on an older node, move that XML file out of the way start slurmd whether that changes how it reports the node.
Also I saw you posted "slurmd -G" on the new one, could you post that from an older one too please?
Best of luck, Chris