<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
I just ran into this issue. Specifically, SLURM looks for the NVML header file, which comes with CUDA or DCGM, in addition to the library that comes with the drivers. The check is at <a href="https://github.com/SchedMD/slurm/blob/a763a008b7700321b51aad2e619deab00638a379/auxdir/x_ac_nvml.m4#L32" class="">https://github.com/SchedMD/slurm/blob/a763a008b7700321b51aad2e619deab00638a379/auxdir/x_ac_nvml.m4#L32</a>.
Once you’ve built SLURM, it’s enough to just have the GPU drivers on the nodes where SLURM will be installed.<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Apr 8, 2020, at 9:32 AM, <a href="mailto:dean.w.schulze@gmail.com" class="">
dean.w.schulze@gmail.com</a> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">I believe in order to compile for nvml you'll have to compile on a system with an Nvidia gpu installed otherwise the Nvidia driver and libraries won't install on that system.<br class="">
<br class="">
-----Original Message-----<br class="">
From: slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" class="">slurm-users-bounces@lists.schedmd.com</a>> On Behalf Of Christopher Samuel<br class="">
Sent: Tuesday, April 7, 2020 10:08 PM<br class="">
To: <a href="mailto:slurm-users@lists.schedmd.com" class="">slurm-users@lists.schedmd.com</a><br class="">
Subject: Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS<br class="">
<br class="">
On 4/7/20 2:48 PM, Robert Kudyba wrote:<br class="">
<br class="">
<blockquote type="cite" class="">How can I get this to work by loading the correct Bright module?<br class="">
</blockquote>
<br class="">
You can't - you will need to recompile Slurm.<br class="">
<br class="">
The error says:<br class="">
<br class="">
Apr 07 16:52:33 node001 slurmd[299181]: fatal: We were configured to autodetect nvml functionality, but we weren't able to find that lib when Slurm was configured.<br class="">
<br class="">
So when Slurm was built the libraries you are telling it to use now were not detected and so the configure script disabled that functionality as it would not otherwise have been able to compile.<br class="">
<br class="">
All the best,<br class="">
Chris<br class="">
-- <br class="">
Chris Samuel : <a href="http://www.csamuel.org/" class="">http://www.csamuel.org/</a> : Berkeley, CA, USA<br class="">
<br class="">
<br class="">
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</body>
</html>