Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would expect).
~$ ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so linux-vdso.so.1 (0x00007ffd9a3f4000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0bc2c06000) /lib64/ld-linux-x86-64.so.2 (0x00007f0bc2e47000)
/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is present during compilation. Also I can see that NVML headers where found in config.status (else I wouldn't get gpu_nvml.so at all to my understanding).
Our old cluster was deployed with NVIDIA deepops (which compiles Slurm on every node) and also has NVML support. There ldd brings the expected result
~$ ldd /usr/local/lib/slurm/gpu_nvml.so ... libnvidia-ml.so.1 => /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007f3b10120000) ...
I can't test actual functionality with my new binaries because I don't have a node with GPUs yet.
Am I missing something?
thank you Matthias