[slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

Tue Apr 7 21:48:10 UTC 2020

> Apr 07 16:52:33 node001 slurmd[299181]: fatal: We were configured to
> autodetect nvml functionality, but we weren't able to find that lib when
> Slurm was configured.
>
>
>
> Apparently the Slurm build you are using has not be compiled against NVML
> and as such it cannot use the autodetect functionality.
>

Since we're using Bright Cluster we just have to load the CUDA toolkit for
NVML. I can run nvidia-sml:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2
  |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |
 0 |
| N/A   29C    P0    37W / 250W |      0MiB / 32510MiB |      0%   E.
Process |
+-------------------------------+----------------------+----------------------+
 We do have GresTypes=gpu,mic,mps and Gres=gpu:v100:1 set in slurm.conf.

At https://slurm.schedmd.com/gres.html I see:
"If AutoDetect=nvml is set in gres.conf, and the NVIDIA Management Library
(NVML) is installed on the node and was found during Slurm configuration,
configuration details will automatically be filled in for any
system-detected NVIDIA GPU. This removes the need to explicitly configure
GPUs in gres.conf, though the Gres= line in slurm.conf is still required in
order to tell slurmctld how many GRES to expect."

How can I get this to work by loading the correct Bright module?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200407/f1324318/attachment.htm>