[slurm-users] Autodetect of nvml is not working in gres.conf

Groner, Rob rug262 at psu.edu
Thu Nov 30 14:16:53 UTC 2023


Did you have --with-nvml as part of your configuration?  Go back to your config.log and verify that it ever said it found nvml.h.

If not, then you'll need to make sure you have the right nvidia/cuda packages installed on the host you're building slurm on, and you might have to specify --with-nvml=<path to nvml install> if it's not in a standard location.

Rob

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Ravi Konila <ravibhatk at gmail.com>
Sent: Thursday, November 30, 2023 9:06 AM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Autodetect of nvml is not working in gres.conf

You don't often get email from ravibhatk at gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hello,

My gres.conf has AutoDetect=nvml
when I restart slurmd service I do get

fatal: We were configured to autodetect nvml functionality, but we weren't able to find that lib when Slurm was configured.

Referred few links to solve along with slurm-users email archives but could not understand much.

Can someone help me with this one. I am using DGX A100 Server which has 4 numbers of A100 80GB GPUs.

With Warm Regards
Ravi Konila
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231130/fd2d25b3/attachment-0001.htm>


More information about the slurm-users mailing list