[slurm-users] Compiling Slurm with nvml support

Paul Edmon pedmon at cfa.harvard.edu
Thu Sep 24 19:38:25 UTC 2020


That's what we do here.  We have three different rpms we build.

server: because we run the latest MariaDB on our master

general compute

gpu compute: because we build against nvml

We name these all the same but have them in different repos and 
distribute the repos to each node appropriately.

We also have a git repo in which we manage our slurm.spec file with a 
branch for each version and type so we can keep organized.

-Paul Edmon-

On 9/24/2020 3:31 PM, Dana, Jason T. wrote:
>
> Hello,
>
> I hopefully have a quick question.
>
> I have compiled Slurm RPMs on a CentOS system with nvidia drivers 
> installed so that I can utilize AutoDetect=nvml configuration in our 
> GPU nodes’ gres.conf. All seems to be going well on the GPU nodes 
> since I have done that. I was unable to install the slurm RPM on the 
> control/master node as the RPM required libnvidia-ml.so to be 
> installed. The control/master and other compute nodes don’t have any 
> nvidia cards attached to them, so I believed installing the drivers 
> just to satisfy this requirement might not be the best idea. I 
> recreated the RPM without the drivers present to get around this and 
> everything has been working great as far as I can tell.
>
> I am now working on adding pmix support that I didn’t properly add 
> initially and am encountering this situation again. I figured I would 
> send up a flag and see if maybe I am going about this the wrong way. 
> Is it typical to have to compile the slurm RPMs for different types of 
> nodes or am I completely going about this the wrong way?
>
> Thanks in advance!
>
> Jason
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200924/8ffd0b67/attachment.htm>


More information about the slurm-users mailing list