[slurm-users] Building Slurm RPMs with NVIDIA GPU support?

Robert Kudyba rkudyba at fordham.edu
Tue Jan 26 20:24:15 UTC 2021


You all might be interested in a patch to the SPEC file, to not make the
slurm RPMs depend on libnvidia-ml.so, even if it's been enabled at
configure time. See https://bugs.schedmd.com/show_bug.cgi?id=7919#c3

On Tue, Jan 26, 2021 at 3:17 PM Paul Raines <raines at nmr.mgh.harvard.edu>
wrote:

>
> You should check your jobs that allocated GPUs and make sure
> CUDA_VISIBLE_DEVICES is being set in the environment.  This is a sign
> you GPU support is not really there but SLURM is just doing "generic"
> resource assignment.
>
> I have both GPU and non-GPU nodes.  I build SLURM rpms twice. Once on a
> non-GPU node and use those RPMs to install on the non-GPU nodes. Then
> build
> again on the GPU node where CUDA is installed via the NVIDIA CUDA YUM repo
> rpms so the NVML lib is at /lib64/libnvidia-ml.so.1 (from rpm
> nvidia-driver-NVML-455.45.01-1.el8.x86_64) and no special mods to the
> default
> RPM SPEC is needed.  I just run
>
>    rpmbuild --tb slurm-20.11.3.tar.bz2
>
> You can run 'rpm -qlp slurm-20.11.3-1.el8.x86_64.rpm | grep nvml' and see
> that /usr/lib64/slurm/gpu_nvml.so only exists on the one built on the
> GPU node.
>
> -- Paul Raines (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__help.nmr.mgh.harvard.edu&d=DwIBAg&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=GNGEhyc3F2bEZxbHK93tumXk56f37DOl99aYsOeUVOE&s=ZuCDM15RrOpv2t-j8DywWrwpn86qa79eBuSPEs96SFo&e=
> )
>
>
>
> On Tue, 26 Jan 2021 2:29pm, Ole Holm Nielsen wrote:
>
> > In another thread, On 26-01-2021 17:44, Prentice Bisbal wrote:
> >>  Personally, I think it's good that Slurm RPMs are now available through
> >>  EPEL, although I won't be able to use them, and I'm sure many people on
> >>  the list won't be able to either, since licensing issues prevent them
> from
> >>  providing support for NVIDIA drivers, so those of us with GPUs on our
> >>  clusters will still have to compile Slurm from source to include NVIDIA
> >>  GPU support.
> >
> > We're running Slurm 20.02.6 and recently added some NVIDIA GPU nodes.
> > The Slurm GPU documentation seems to be
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_gres.html&d=DwIBAg&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=GNGEhyc3F2bEZxbHK93tumXk56f37DOl99aYsOeUVOE&s=GxF9VoynMmgS3BBrWWsmPM1Itt0hshTIkGh3x4Xy3hA&e=
> > We don't seem to have any problems scheduling jobs on GPUs, even though
> our
> > Slurm RPM build host doesn't have any NVIDIA software installed, as
> shown by
> > the command:
> > $ ldconfig -p | grep libnvidia-ml
> >
> > I'm curious about Prentice's statement about needing NVIDIA libraries to
> be
> > installed when building Slurm RPMs, and I read the discussion in bug
> 9525,
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D9525&d=DwIBAg&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=GNGEhyc3F2bEZxbHK93tumXk56f37DOl99aYsOeUVOE&s=6GDTIFa-spnv8ZMtKsdwJaLreyZMX4T5EW3MnAX54iI&e=
> > from which it seems that the problem was fixed in 20.02.6 and 20.11.
> >
> > Question: Is there anything special that needs to be done when building
> Slurm
> > RPMs with NVIDIA GPU support?
> >
> > Thanks,
> > Ole
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210126/54ec2df6/attachment-0001.htm>


More information about the slurm-users mailing list