[slurm-users] Building Slurm RPMs with NVIDIA GPU support?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Jan 26 20:10:23 UTC 2021


Thanks Paul!

On 26-01-2021 20:50, Paul Edmon wrote:
> In our RPM spec we use to build slurm we do the following additional 
> things for GPU's:
> 
> BuildRequires: cuda-nvml-devel-11-1
> 
> the in the %build section we do:
> 
> export CFLAGS="$CFLAGS 
> -L/usr/local/cuda-11.1/targets/x86_64-linux/lib/stubs/ 
> -I/usr/local/cuda-11.1/targets/x86_64-linux/include/"
> 
> That ensures the cuda libs are installed and it directs slurm to where 
> they are.  After that configure should detect the nvml libs and link 
> against them.
> 
> I've attached our full spec that we use to build.

What I don't understand is, is it actually *required* to make the NVIDIA 
libraries available to Slurm?  I didn't do that, and I'm not aware of 
any problems with our GPU nodes so far.  Of course, our GPU nodes have 
the libraries installed and the /dev/nvidia? devices are present.

Are some of Slurm's GPU features missing or broken without the 
libraries?  SchedMD's slurm.spec file doesn't mention any "--with 
nvidia" (or similar) build options, so I'm really puzzled.

Most of our nodes don't have GPUs, so I wouldn't like to install 
libraries on those nodes needlessly.

Thanks,
Ole

> On 1/26/2021 2:29 PM, Ole Holm Nielsen wrote:
>> In another thread, On 26-01-2021 17:44, Prentice Bisbal wrote:
>>> Personally, I think it's good that Slurm RPMs are now available 
>>> through EPEL, although I won't be able to use them, and I'm sure many 
>>> people on the list won't be able to either, since licensing issues 
>>> prevent them from providing support for NVIDIA drivers, so those of 
>>> us with GPUs on our clusters will still have to compile Slurm from 
>>> source to include NVIDIA GPU support.
>>
>> We're running Slurm 20.02.6 and recently added some NVIDIA GPU nodes.
>> The Slurm GPU documentation seems to be
>> https://slurm.schedmd.com/gres.html
>> We don't seem to have any problems scheduling jobs on GPUs, even 
>> though our Slurm RPM build host doesn't have any NVIDIA software 
>> installed, as shown by the command:
>> $ ldconfig -p | grep libnvidia-ml
>>
>> I'm curious about Prentice's statement about needing NVIDIA libraries 
>> to be installed when building Slurm RPMs, and I read the discussion in 
>> bug 9525,
>> https://bugs.schedmd.com/show_bug.cgi?id=9525
>> from which it seems that the problem was fixed in 20.02.6 and 20.11.
>>
>> Question: Is there anything special that needs to be done when 
>> building Slurm RPMs with NVIDIA GPU support?
>>
>> Thanks,
>> Ole



More information about the slurm-users mailing list