[slurm-users] Compiling Slurm with nvml support

Kilian Cavalotti kilian.cavalotti.work at gmail.com
Fri Sep 25 00:30:48 UTC 2020


Hi Jason,

We're taking the approach proposed in
https://bugs.schedmd.com/show_bug.cgi?id=7919: same RPM everywhere,
but without the dependencies that you don't want installed globally
(like NVML, PMIx...). Of course you need to satisfy those dependencies
some other way on the nodes that require them, but at least you only
have one set of RPMs to build and just one SPEC file to manage.

Cheers,
-- 
Kilian

On Thu, Sep 24, 2020 at 12:40 PM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>
> That's what we do here.  We have three different rpms we build.
>
> server: because we run the latest MariaDB on our master
>
> general compute
>
> gpu compute: because we build against nvml
>
> We name these all the same but have them in different repos and distribute the repos to each node appropriately.
>
> We also have a git repo in which we manage our slurm.spec file with a branch for each version and type so we can keep organized.
>
> -Paul Edmon-
>
> On 9/24/2020 3:31 PM, Dana, Jason T. wrote:
>
> Hello,
>
>
>
> I hopefully have a quick question.
>
>
>
> I have compiled Slurm RPMs on a CentOS system with nvidia drivers installed so that I can utilize AutoDetect=nvml configuration in our GPU nodes’ gres.conf. All seems to be going well on the GPU nodes since I have done that. I was unable to install the slurm RPM on the control/master node as the RPM required libnvidia-ml.so to be installed. The control/master and other compute nodes don’t have any nvidia cards attached to them, so I believed installing the drivers just to satisfy this requirement might not be the best idea. I recreated the RPM without the drivers present to get around this and everything has been working great as far as I can tell.
>
>
>
> I am now working on adding pmix support that I didn’t properly add initially and am encountering this situation again. I figured I would send up a flag and see if maybe I am going about this the wrong way. Is it typical to have to compile the slurm RPMs for different types of nodes or am I completely going about this the wrong way?
>
>
>
> Thanks in advance!
>
>
>
> Jason



-- 
Kilian



More information about the slurm-users mailing list