[slurm-users] Autodetect of nvml is not working in
Ravi Konila
ravibhatk at gmail.com
Thu Nov 30 17:19:28 UTC 2023
Hi Zhang
Thanks for the quick reply.
Could you please guide me on specifying MIG partitions in gres.conf and in slurm.conf
My MIG is as below:
root at rl-dgxs-r21-l2:~# sudo nvidia-smi mig -lgi
+----------------------------------------------------------------+
| GPU instances: |
| GPU Name Profile Instance Placement |
| ID ID Start:Size |
|========================================|
| 0 MIG 1g.10gb 19 9 2:1 |
+----------------------------------------------------------------+
| 0 MIG 1g.10gb 19 10 3:1 |
+----------------------------------------------------------------+
| 0 MIG 2g.20gb 14 3 0:2 |
+----------------------------------------------------------------+
| 0 MIG 3g.40gb 9 2 4:4 |
+----------------------------------------------------------------+
root at rl-dgxs-r21-l2:~# nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-a044d304-28b2-c3f1-42ea-b9440d868231)
MIG 3g.40gb Device 0: (UUID: MIG-d4514f04-e287-50e9-b3c4-c19fddbb9aa2)
MIG 2g.20gb Device 1: (UUID: MIG-4f393220-5308-51f7-bd7a-322306593545)
MIG 1g.10gb Device 2: (UUID: MIG-4d988c3e-160a-52f3-a3e1-8eeccfee4585)
MIG 1g.10gb Device 3: (UUID: MIG-4ff411c0-c0e2-5b86-a3a4-e76a6b6491cb)
GPU 1: NVIDIA A100-SXM4-80GB (UUID: GPU-6aecec97-f63e-4815-3c20-503c4e82fa57)
GPU 2: NVIDIA A100-SXM4-80GB (UUID: GPU-2ab474df-49c4-531a-6580-cd44d9982d0a)
GPU 3: NVIDIA DGX Display (UUID: GPU-aa49c52b-640c-39b2-1cee-a0120d0b5fa7)
GPU 4: NVIDIA A100-SXM4-80GB (UUID: GPU-c47adbbe-7ef8-246c-f700-010542e60ad0)
Any suggestions on using MIG partitions in my slurm jobs?
With Warm Regards
Ravi
From: Shunran Zhang
Sent: Thursday, November 30, 2023 9:50 PM
To: Ravi Konila ; Slurm User Community List
Subject: Re: [slurm-users] Autodetect of nvml is not working in
Hi Ravi
Unfortunately if the NVML flag is off on compile time ( when the maintainer build the apt package for you to install ), that part of code would not be in your binary code.
Recompile yourself following the official documentation or find some repository that builds slurm with NVML are your only options.
Good luck
S. Zhang
Ravi Konila <ravibhatk at gmail.com>於2023年12月1日 00:51寫道:
Hi Josef and Rob
Thanks for the reply.
I do agree cuda-nvml-devel was not there while installing slurm-llnl in Ubuntu 22.04.
Later I installed it.
I did not build slurm but I installed it from apt install slurm command.
Is there any method to use it post slurm installation?
With Warm Regards
Ravi K.
Ph: +91-9901072688
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231130/df1ec575/attachment.htm>
More information about the slurm-users
mailing list