[slurm-users] Gres value shows null in "scontrol show node node-1" , even though "nvidia-smi" shows GPU values
John Joseph
jjk_saji at yahoo.com
Sun Nov 26 06:01:44 UTC 2023
Dear All,
Good morning
I am able to setup a 4 node SLURM system, I am using Ubuntu 22.04 and my SLUM is working,
Each of the nodes we have GPU cards, and I am abble to see the information of GPU using “Nvidia-smi”
but when I check for “scontrol show node-1”, not able to see any entry for “Grey” , “Gres” valuses shows as null, also in the “CfgTRES” entry also not showing the gpu based entry , I am pasting my reulsts of “scontrol show node-1” , “slurmd -C” amd “nvidia-smi” here for reference
"scontrol show node node-1"
NodeName=node-1 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUTot=72 CPULoad=0.03
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=node-1 NodeHostName=node-1 Version=21.08.5
OS=Linux 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2
RealMemory=773685 AllocMem=0 FreeMem=770972 Sockets=72 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=debug
BootTime=2023-11-23T09:06:28 SlurmdStartTime=2023-11-23T09:07:39
LastBusyTime=2023-11-23T09:07:40
CfgTRES=cpu=72,mem=773685M,billing=72
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
root at node-1:~# slurmd -C
NodeName=node-1 CPUs=72 Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=2 RealMemory=773685
UpTime=0-23:48:41
root at node-1:~# nvidia-smi
Fri Nov 24 08:55:50 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-16GB Off | 00000000:06:00.0 Off | 0 |
| N/A 26C P0 23W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE-16GB Off | 00000000:86:00.0 Off | 0 |
| N/A 25C P0 24W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2010 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2010 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
Request guidance on what configuration parameters I have missed out, so that I am not able to see the GPU part in
"scontrol show node node-1”
Thanks
Joseph John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231126/ea317d16/attachment-0001.htm>
More information about the slurm-users
mailing list