[slurm-users] Re: _node_config_validate: gres/gpu: Count changed on node (0 != 2)

23 Mar 2026

      Hey,

I am not 100% sure yet as that needs further testing (in case it is a 
race condition), but I think I was able to fix my issue by using the 
NodeName gres.conf format and supplying it to each node instead of 
placing the gres.conf files just on the nodes with gres with the node 
specific information.

Best,
Xaver

On 3/23/26 15:13, Hermann Schwärzler via slurm-users wrote:
...
Hi everyone,
On 3/23/26 14:11, Xaver Stiensmeier via slurm-users wrote:
[...]
...
so I am wondering whether that is the issue. I also noticed that 
after powering up the node without requesting a gpu (works), 
scheduling to the node by requesting a GPU is not an issue.
[...]
We noticed this as well: after powering up a node the GPU device-files 
(/dev/nvidia*) are not created (immediately).
What we did:
we changed the slurmd.service file and added
ExecStartPre=-/path/to/nvidia-smi -L
to the [Service] section.
This creates the device files and a failure (e.g. on non-GPU nodes) is 
ignored by systemd (due to the "-" before the command).
Maybe this helps?
Kind regards,
Hermann