Yes I agree about the reservation, that was the next thing I was about to focus on.....

Please do show your res config.

On Wed, Nov 26, 2025, 3:26 PM Christopher Samuel via slurm-users <slurm-users@lists.schedmd.com> wrote:
On 11/13/25 2:16 pm, Lee via slurm-users wrote:

> 1. When I look at our 8 non-MIG DGXs, via `scontrol show node=dgxXY |
> grep Gres`, 7/8 DGXs report "Gres=gpu:*H100*:8(S:0-1)" while dgx09
> reports "Gres=gpu:*h100*:8(S:0-1)"

Two thoughts:

1) Looking at the 24.11 code when it's using NVML to get the names
everything gets lowercased - so I wonder if these new ones are getting
correctly discovered by NVML but the older ones are not and so using the
uppercase values in your config?

        gpu_common_underscorify_tolower(device_name);

I would suggest making sure the GPU names are lower-cased everywhere for
consistency.

2) From memory (away from work at the moment) slurmd caches hwloc
library information in an XML file - you might want to go and find that
on an older and newer node and compare those to see if you see the same
difference there.  It could be interesting to see if you stop slurmd on
an older node, move that XML file out of the way start slurmd whether
that changes how it reports the node.

Also I saw you posted "slurmd -G" on the new one, could you post that
from an older one too please?

Best of luck,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Philadelphia, PA, USA

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com