Hey, I am not 100% sure yet as that needs further testing (in case it is a race condition), but I think I was able to fix my issue by using the NodeName gres.conf format and supplying it to each node instead of placing the gres.conf files just on the nodes with gres with the node specific information. Best, Xaver On 3/23/26 15:13, Hermann Schwärzler via slurm-users wrote:
Hi everyone,
On 3/23/26 14:11, Xaver Stiensmeier via slurm-users wrote: [...]
so I am wondering whether that is the issue. I also noticed that after powering up the node without requesting a gpu (works), scheduling to the node by requesting a GPU is not an issue. [...]
We noticed this as well: after powering up a node the GPU device-files (/dev/nvidia*) are not created (immediately).
What we did: we changed the slurmd.service file and added
ExecStartPre=-/path/to/nvidia-smi -L
to the [Service] section. This creates the device files and a failure (e.g. on non-GPU nodes) is ignored by systemd (due to the "-" before the command).
Maybe this helps?
Kind regards, Hermann