Hi Matthias,
Just another user here, but we did notice similar behaviour on our cluster with NVIDIA GPU nodes. For this cluster, we built slurm 24.05.1 deb packages from source ourselves on Ubuntu 22.04 with the `libnvidia-ml-dev` package installed directly from the Ubuntu package archive (using the mk-build-deps / debuild method described here:
https://slurm.schedmd.com/quickstart_admin.html#debuild)
In our cluster, the dynamic object dependencies for the gpu_nvml.so shared object file looks the same as yours (they do not show a dependency on /lib/x86_64-linux-gnu/libnvidia-ml.so.1 though we do have it available):
```
ubuntu@gpu0:~$ ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so
linux-vdso.so.1 (0x00007ffe8c3b4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f76301a9000)
/lib64/ld-linux-x86-64.so.2 (0x00007f76303ef000)
```
However, NVML autodetection is working:
```
ubuntu@gpu0:~$ sudo grep nvml /var/log/slurm/slurmd.log | tail -n 1
[2024-11-05T16:09:06.359] gpu/nvml: _get_system_gpu_list_nvml: 8 GPU system device(s) detected
```
I can also confirm that NVML library functions are being referenced from gpu_nvml.so (but are undefined therein):
```
ubuntu@gpu0:~$ objdump -T /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so | grep nvmlInit_v2
0000000000000000 D *UND* 0000000000000000 Base nvmlInit_v2
```
It looks like at some point, slurm has moved to a model where the NVML library (libnvidia-ml.so) is autodetected and dlopen'ed prior to being needed by the plugin, so the plugins can now assume that it will be preloaded if available and no longer need to have a shared library dependency on it:
https://github.com/SchedMD/slurm/blob/slurm-24-05-1-1/src/interfaces/gpu.c#L80-L101Cheers,
Josh.