[slurm-users] Gres value shows null in "scontrol show node node-1" , even though "nvidia-smi" shows GPU values

John Joseph jjk_saji at yahoo.com
Sun Nov 26 06:01:44 UTC 2023



Dear All, 

Good morning 

I am able to setup a 4 node SLURM system, I am using Ubuntu 22.04 and my SLUM is working,

Each of the nodes we have GPU cards, and I am abble to see the information of GPU using “Nvidia-smi”




 but when I check for “scontrol show node-1”, not able to see any entry for “Grey” , “Gres” valuses shows as null, also in the “CfgTRES” entry also not showing the gpu based entry ,  I am pasting my reulsts of “scontrol show node-1” , “slurmd -C” amd “nvidia-smi” here for reference    




 "scontrol show node node-1"




NodeName=node-1 Arch=x86_64 CoresPerSocket=1 

   CPUAlloc=0 CPUTot=72 CPULoad=0.03

   AvailableFeatures=(null)

   ActiveFeatures=(null)

   Gres=(null)

   NodeAddr=node-1 NodeHostName=node-1 Version=21.08.5

   OS=Linux 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov  2 18:01:13 UTC 2 

   RealMemory=773685 AllocMem=0 FreeMem=770972 Sockets=72 Boards=1

   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A

   Partitions=debug 

   BootTime=2023-11-23T09:06:28 SlurmdStartTime=2023-11-23T09:07:39

   LastBusyTime=2023-11-23T09:07:40

   CfgTRES=cpu=72,mem=773685M,billing=72

   AllocTRES=

   CapWatts=n/a

   CurrentWatts=0 AveWatts=0

   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s







root at node-1:~# slurmd -C

NodeName=node-1 CPUs=72 Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=2 RealMemory=773685

UpTime=0-23:48:41







root at node-1:~# nvidia-smi 

Fri Nov 24 08:55:50 2023       

+---------------------------------------------------------------------------------------+

| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |

|-----------------------------------------+----------------------+----------------------+

| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |

|                                         |                      |               MIG M. |

|=========================================+======================+======================|

|   0  Tesla V100-PCIE-16GB           Off | 00000000:06:00.0 Off |                    0 |

| N/A   26C    P0              23W / 250W |      4MiB / 16384MiB |      0%      Default |

|                                         |                      |                  N/A |

+-----------------------------------------+----------------------+----------------------+

|   1  Tesla V100-PCIE-16GB           Off | 00000000:86:00.0 Off |                    0 |

| N/A   25C    P0              24W / 250W |      4MiB / 16384MiB |      0%      Default |

|                                         |                      |                  N/A |

+-----------------------------------------+----------------------+----------------------+

                                                                                         

+---------------------------------------------------------------------------------------+

| Processes:                                                                            |

|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |

|        ID   ID                                                             Usage      |

|=======================================================================================|

|    0   N/A  N/A      2010      G   /usr/lib/xorg/Xorg                            4MiB |

|    1   N/A  N/A      2010      G   /usr/lib/xorg/Xorg                            4MiB |

+---------------------------------------------------------------------------------------+









Request guidance on what configuration parameters I have missed out, so that I am not able to see the GPU part in 

"scontrol show node node-1”


Thanks

Joseph John 















-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231126/ea317d16/attachment-0001.htm>


More information about the slurm-users mailing list