[slurm-users] GPU Gres Type inconsistencies

Tue Jun 20 08:44:28 UTC 2023

For the benefit of anyone else who comes across this, I've managed to resolve the issue.

  1.  Remove the affected node entries from the slurm.conf on slurmctld host
  2.  Restart slurmctld
  3.  Re-add the nodes back to slurm.conf on slurmctld host
  4.  Restart slurmctld again

Following this, the Gres= lines in `scontrol show node ...` display the new type. I guess this means slurmctld was persisting some state about the previous gres type somewhere, but I'm not sure where, and removing the node from slurm.conf and restarting caused this to be flushed.

--
Regards,
Ben Roberts

From: Ben Roberts
Sent: 19 June 2023 11:57
To: slurm-users at lists.schedmd.com
Subject: GPU Gres Type inconsistencies

Hi all,

I'm trying to set up GPU Gres Types to correctly identify the installed hardware (generation and memory size). I'm using a mix of explicit configuration (to set a friendly type name) and autodetection (to handle the cores and links detection). I'm seeing two related issues which I don't understand.

  1.  The output of `scontrol show node` references `Gres=gpu:tesla:2` instead of the type I'm specifying in the config file (`v100s-pcie-32gb`)
  2.  Attempts to schedule jobs using generic `--gpus 1` are working fine, but attempts to specify the gpu type (either with `--gres gpu:v100s-pcie-32gb:1` or `--gres gpu:v100s-pcie-32gb:1` fail with `error: Unable to allocate resources: Requested node configuration is not available`

If I've understood the documentation (https://slurm.schedmd.com/gres.conf.html#OPT_Type), I should be able to use any substring of what nvml detects the card as (`tesla_v100s-pcie-32gb`) as the Type string. With gres debug flag set, I can see the GPUs are detected, and matched up with the static entries in gres.conf correctly. I don't see any mention of Type=tesla in the logs, so I'm at a loss as to why scontrol show node is reporting `gpu:tesla` instead of `gpu:v100s-pcie-32gb` as configured. I presume this mismatch is the cause of the failure to schedule, because while the job spec matches the configured gpu type and should be schedulable, the scheduler doesn't actually see any resources of this type available to run.

The "tesla" string is the first "word" of the autodetected type, but I can't see why it would be being truncated to just this rather than using the whole string. I did previously use the type "tesla" in the config, which worked fine since everything matched up, but since does not adequately describe the hardware so I need to change this to be more specific. Is there anywhere other than slurm.conf or gres.conf where the old gpu type might be persisted and need purging?

I've tried using `scontrol update node=gpu2 gres=gpu:v100s-pcie-32gb:0` to manually change the gres type (trying to set the number of GPUs to 2 here is rejected, but 0 is accepted). `scontrol reconfig` then causes the `scontrol show node` output to update to `Gres=vpu:v100s-pcie-32gb:2` as expected, but removes the gpus from CfgTRES. After restarting slurmctld, the Gres, and cfgTRES briefly match up for all nodes, but very shortly after the Gres entries revert back to Gres=gpu:tesla:0 again, so back to square 1.

I've tried using the full tesla_v100s-pcie-32gb string as the type also, but this has no effect, the gres type is still reported as gpu:tesla only. This is all with slurm 23.02.3, on Rocky Linux 8.8, using cuda-nvml-devel-12-0-12.0.140-1.x86_64. Excerpts from configs and logs shown below.

Can anyone point me in the right direction in how to solve this? Thanks,

# /etc/slurm/gres.conf
Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia0
Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia1
AutoDetect=nvml

# /etc/slurm/slurm.conf (identical on all nodes)
AccountingStorageTRES=gres/gpu,gres/gpu:v100s-pcie-32gb,gres/gpu:v100-pcie-32gb
EnforcePartLimits=ANY
GresTypes=gpu
NodeName=gpu2 CoresPerSocket=8 CPUs=8 Gres=gpu:v100s-pcie-32gb:2 Sockets=1 ThreadsPerCore=1

# scontrol show node gpu2
NodeName=gpu2 Arch=x86_64 CoresPerSocket=8
   CPUAlloc=0 CPUEfctv=8 CPUTot=8 CPULoad=0.02
   AvailableFeatures=...
   Gres=gpu:tesla:0(S:0)
   NodeAddr=gpu2.example.com NodeHostName=gpu2 Version=23.02.3
   OS=Linux 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Tue May 30 22:15:39 UTC 2023
   RealMemory=331301 AllocMem=0 FreeMem=334102 Sockets=1 Boards=1
   MemSpecLimit=500
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu
   BootTime=2023-06-14T23:03:05 SlurmdStartTime=2023-06-18T23:25:21
   LastBusyTime=2023-06-18T23:23:23 ResumeAfterTime=None
   CfgTRES=cpu=8,mem=331301M,billing=8,gres/gpu=2,gres/gpu:v100s-pcie-32gb=2
   AllocTRES=

# /var/log/slurm/slurmd.log (trimmed to only relevant lines for brevity)
[2023-06-19T11:29:25.629] GRES: Global AutoDetect=nvml(1)
[2023-06-19T11:29:25.629] debug:  gres/gpu: init: loaded
[2023-06-19T11:29:25.629] debug:  gpu/nvml: init: init: GPU NVML plugin loaded
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _nvml_init: Successfully initialized NVML
[2023-06-19T11:29:26.265] debug:  gpu/nvml: _get_system_gpu_list_nvml: Systems Graphics Driver Version: 525.105.17
[2023-06-19T11:29:26.265] debug:  gpu/nvml: _get_system_gpu_list_nvml: NVML Library Version: 12.525.105.17
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: NVML API Version: 11
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Total CPU count: 8
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Device count: 2
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU index 0:
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Name: tesla_v100s-pcie-32gb
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     UUID: GPU-1ef493da-bf08-60a4-8afb-4db79646f86e
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI Domain/Bus/Device: 0:11:0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI Bus ID: 00000000:0B:00.0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     NVLinks: -1,0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Device File (minor number): /dev/nvidia0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     CPU Affinity Range - Machine: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Core Affinity Range - Abstract: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     MIG mode: disabled
[2023-06-19T11:29:26.302] debug2: Possible GPU Memory Frequencies (1):
[2023-06-19T11:29:26.302] debug2: -------------------------------
[2023-06-19T11:29:26.302] debug2:     *1107 MHz [0]
[2023-06-19T11:29:26.302] debug2:         Possible GPU Graphics Frequencies (196):
[2023-06-19T11:29:26.302] debug2:         ---------------------------------
[2023-06-19T11:29:26.302] debug2:           *1597 MHz [0]
[2023-06-19T11:29:26.302] debug2:           *1590 MHz [1]
[2023-06-19T11:29:26.302] debug2:           ...
[2023-06-19T11:29:26.302] debug2:           *870 MHz [97]
[2023-06-19T11:29:26.302] debug2:           ...
[2023-06-19T11:29:26.302] debug2:           *142 MHz [194]
[2023-06-19T11:29:26.302] debug2:           *135 MHz [195]
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU index 1:
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Name: tesla_v100s-pcie-32gb
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     UUID: GPU-0e7d20b1-5a0f-8ef6-5120-970bd26210bb
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI Domain/Bus/Device: 0:19:0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI Bus ID: 00000000:13:00.0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     NVLinks: 0,-1
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Device File (minor number): /dev/nvidia1
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     CPU Affinity Range - Machine: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Core Affinity Range - Abstract: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     MIG mode: disabled
[2023-06-19T11:29:26.303] debug2: Possible GPU Memory Frequencies (1):
[2023-06-19T11:29:26.303] debug2: -------------------------------
[2023-06-19T11:29:26.303] debug2:     *1107 MHz [0]
[2023-06-19T11:29:26.303] debug2:         Possible GPU Graphics Frequencies (196):
[2023-06-19T11:29:26.303] debug2:         ---------------------------------
[2023-06-19T11:29:26.303] debug2:           *1597 MHz [0]
[2023-06-19T11:29:26.303] debug2:           *1590 MHz [1]
[2023-06-19T11:29:26.303] debug2:           ...
[2023-06-19T11:29:26.303] debug2:           *870 MHz [97]
[2023-06-19T11:29:26.303] debug2:           ...
[2023-06-19T11:29:26.303] debug2:           *142 MHz [194]
[2023-06-19T11:29:26.303] debug2:           *135 MHz [195]
[2023-06-19T11:29:26.303] gpu/nvml: _get_system_gpu_list_nvml: 2 GPU system device(s) detected
[2023-06-19T11:29:26.303] Gres GPU plugin: Merging configured GRES with system GPUs
[2023-06-19T11:29:26.303] debug2: gres/gpu: _merge_system_gres_conf: gres_list_conf:
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):(null)  Links:(null) Flags:HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT File:/dev/nvidia0 UniqueId:(null)
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):(null)  Links:(null) Flags:HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT File:/dev/nvidia1 UniqueId:(null)
[2023-06-19T11:29:26.303] debug:  gres/gpu: _merge_system_gres_conf: Including the following GPU matched between system and configuration:
[2023-06-19T11:29:26.303] debug:      GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7  Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 UniqueId:(null)
[2023-06-19T11:29:26.303] debug:  gres/gpu: _merge_system_gres_conf: Including the following GPU matched between system and configuration:
[2023-06-19T11:29:26.303] debug:      GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7  Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 UniqueId:(null)
[2023-06-19T11:29:26.303] debug2: gres/gpu: _merge_system_gres_conf: gres_list_gpu
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7  Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 UniqueId:(null)
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7  Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 UniqueId:(null)
[2023-06-19T11:29:26.303] Gres GPU plugin: Final merged GRES list:
[2023-06-19T11:29:26.303]     GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7  Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 UniqueId:(null)
[2023-06-19T11:29:26.303]     GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7  Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 UniqueId:(null)
[2023-06-19T11:29:26.303] GRES: _set_gres_device_desc : /dev/nvidia0 major 195, minor 0
[2023-06-19T11:29:26.303] GRES: _set_gres_device_desc : /dev/nvidia1 major 195, minor 1
[2023-06-19T11:29:26.303] GRES: gpu device number 0(/dev/nvidia0):c 195:0 rwm
[2023-06-19T11:29:26.303] GRES: gpu device number 1(/dev/nvidia1):c 195:1 rwm
[2023-06-19T11:29:26.303] Gres Name=gpu Type=v100s-pcie-32gb Count=1 Index=0 ID=7696487 File=/dev/nvidia0 Cores=0-7 CoreCnt=8 Links=-1,0 Flags=HAS_FILE,HAS_TYPE,ENV_NVML
[2023-06-19T11:29:26.303] Gres Name=gpu Type=v100s-pcie-32gb Count=1 Index=1 ID=7696487 File=/dev/nvidia1 Cores=0-7 CoreCnt=8 Links=0,-1 Flags=HAS_FILE,HAS_TYPE,ENV_NVML
[2023-06-19T11:29:26.303] CPU frequency setting not configured for this node
[2023-06-19T11:29:26.304] slurmd version 23.02.3 started
[2023-06-19T11:29:26.306] slurmd started on Mon, 19 Jun 2023 11:29:26 +0100
[2023-06-19T11:29:26.307] CPUs=8 Boards=1 Sockets=1 Cores=8 Threads=1 Memory=338063 TmpDisk=2048 Uptime=390381 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
[2023-06-19T11:29:26.310] debug:  _handle_node_reg_resp: slurmctld sent back 14 TRES.

--
Regards,
Ben Roberts

For details of how GSA uses your personal information, please see our Privacy Notice here: https://www.gsacapital.com/privacy-notice 

This email and any files transmitted with it contain confidential and proprietary information and is solely for the use of the intended recipient.
If you are not the intended recipient please return the email to the sender and delete it from your computer and you must not use, disclose, distribute, copy, print or rely on this email or its contents.
This communication is for informational purposes only.
It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction.
Any comments or statements made herein do not necessarily reflect those of GSA Capital.
GSA Capital Partners LLP is authorised and regulated by the Financial Conduct Authority and is registered in England and Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same address, number 5320529.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230620/417484f9/attachment-0001.htm>