[slurm-users] Autodetect of nvml is not working in

Ravi Konila ravibhatk at gmail.com
Thu Nov 30 17:19:28 UTC 2023


Hi Zhang

Thanks for the quick reply. 

Could you please guide me on specifying MIG partitions in gres.conf and in slurm.conf

My MIG is as below:

root at rl-dgxs-r21-l2:~# sudo nvidia-smi mig -lgi
+----------------------------------------------------------------+
| GPU instances:                                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID                      Start:Size    |
|========================================|
|   0  MIG 1g.10gb         19        9              2:1          |
+----------------------------------------------------------------+
|   0  MIG 1g.10gb         19       10              3:1         |
+----------------------------------------------------------------+
|   0  MIG 2g.20gb         14        3              0:2          |
+----------------------------------------------------------------+
|   0  MIG 3g.40gb          9        2              4:4           |
+----------------------------------------------------------------+

root at rl-dgxs-r21-l2:~# nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-a044d304-28b2-c3f1-42ea-b9440d868231)
  MIG 3g.40gb     Device  0: (UUID: MIG-d4514f04-e287-50e9-b3c4-c19fddbb9aa2)
  MIG 2g.20gb     Device  1: (UUID: MIG-4f393220-5308-51f7-bd7a-322306593545)
  MIG 1g.10gb     Device  2: (UUID: MIG-4d988c3e-160a-52f3-a3e1-8eeccfee4585)
  MIG 1g.10gb     Device  3: (UUID: MIG-4ff411c0-c0e2-5b86-a3a4-e76a6b6491cb)
GPU 1: NVIDIA A100-SXM4-80GB (UUID: GPU-6aecec97-f63e-4815-3c20-503c4e82fa57)
GPU 2: NVIDIA A100-SXM4-80GB (UUID: GPU-2ab474df-49c4-531a-6580-cd44d9982d0a)
GPU 3: NVIDIA DGX Display (UUID: GPU-aa49c52b-640c-39b2-1cee-a0120d0b5fa7)
GPU 4: NVIDIA A100-SXM4-80GB (UUID: GPU-c47adbbe-7ef8-246c-f700-010542e60ad0)

Any suggestions on using MIG partitions in my slurm jobs?

With Warm Regards
Ravi 



From: Shunran Zhang 
Sent: Thursday, November 30, 2023 9:50 PM
To: Ravi Konila ; Slurm User Community List 
Subject: Re: [slurm-users] Autodetect of nvml is not working in

Hi Ravi 

Unfortunately if the NVML flag is off on compile time ( when the maintainer build the apt package for you to install ), that part of code would not be in your binary code. 

Recompile yourself following the official documentation or find some repository that builds slurm with NVML are your only options. 

Good luck
S. Zhang


  Ravi Konila <ravibhatk at gmail.com>於2023年12月1日 00:51寫道:


   
  Hi Josef and Rob
  Thanks for the reply.
  I do agree cuda-nvml-devel was not there while installing slurm-llnl in Ubuntu 22.04. 
  Later I installed it. 
  I did not build slurm but I installed it from apt install slurm command. 

  Is there any method to use it post slurm installation?

  With Warm Regards
  Ravi K.
  Ph: +91-9901072688 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231130/df1ec575/attachment.htm>


More information about the slurm-users mailing list