[slurm-users] addressing NVIDIA MIG + non MIG devices in Slurm - solved

Mon Jan 31 11:05:57 UTC 2022

I looked at option
 > 2.2.3 using partial "AutoDetect=nvml"
again and saw that the reason for failure was indeed the sanity check, 
but it was my fault because I set an invalid "Links" value for the 
"hardcoded" GPUs. So this variant of gres.conf setup works and gives me 
everything I want, sorry for bothering you.

Matthias

Am 27.01.22 um 16:27 schrieb Matthias Leopold:
> Hi,
> 
> we have 2 DGX A100 systems which we would like to use with Slurm. We 
> want to use the MIG feature for _some_ of the GPUs. As I somehow 
> suspected I couldn't find a working setup for this in Slurm yet. I'll 
> describe the configuration variants I tried after creating the MIG 
> instances, it might be a longer read, please bear with me.
> 
> 1. using slurm-mig-discovery for gres.conf 
> (https://gitlab.com/nvidia/hpc/slurm-mig-discovery)
> - CUDA_VISIBLE_DEVICES: list of indices
> -> seems to bring a working setup and full flexibility at first, but 
> when taking a closer look the selection of GPU devices is completely 
> unpredictable (output of nvidia-smi inside Slurm job)
> 
> 2. using "AutoDetect=nvml" in gres.conf (Slurm docs)
> - CUDA_VISIBLE_DEVICES: MIG format (see 
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars)
> 
> 2.1 converting ALL GPUs to MIG
> - also a full A100 is converted to a 7g.40gb MIG instance
> - gres.conf: "AutoDetect=nvml" only
> - slurm.conf Node Def: naming all MIG types (read from slurmd debug log)
> -> working setup
> -> problem: IPC (MPI) between MIG instances not possible, this seems to 
> be a by-design limitation
> 
> 2.2 converting SOME GPUs to MIG
> - some A100 are NOT in MIG mode
> 
> 2.2.1 using "AutoDetect=nvml" only (Variant 1)
> - slurm.conf Node Def: Gres with and without type
> -> problem: fatal: _foreach_slurm_conf: Some gpu GRES in slurm.conf have 
> a type while others do not (slurm_gres->gres_cnt_config (26) > tmp_count 
> (21))
> 
> 2.2.2 using "AutoDetect=nvml" only (Variant 2)
> - slurm.conf Node Def: only Gres without type (sum of MIG + non MIG)
> -> problem: different GPU types can't be requested
> 
> 2.2.3 using partial "AutoDetect=nvml"
> - gres.conf: "AutoDetect=nvml" + hardcoding of non MIG GPUs
> - slurm.conf Node Def: MIG + non MIG Gres types
> -> produces a "perfect" config according to slurmd debug log
> -> problem: the sanity-check mode of "AutoDetect=nvml" prevents 
> operation (?)
> -> Reason=gres/gpu:1g.5gb count too low (0 < 21) 
> [slurm at 2022-01-27T11:23:59]
> 
> 2.2.4 using static gres.conf with NVML generated config
> - using a gres.conf with NVML generated config where I can define the 
> type for non MIG GPU and also set the UniqueId for MIG instances would 
> be the perfect solution
> - slurm.conf Node Def: MIG + non MIG Gres types
> -> problem: it doesn't work
> -> Parsing error at unrecognized key: UniqueId
> 
> Thanks for reading this far. Am I missing something? How can MIG and non 
> MIG devices be addressed in a cluster? This setup of having MIG and non 
> MIG devices can't be exotic, since having ONLY MIG devices has severe 
> disadvantages (see 2.1). Thanks again for any advice.
> 
> Matthias
> 

-- 
Matthias Leopold
IT Systems & Communications
Medizinische Universität Wien
Spitalgasse 23 / BT 88 / Ebene 00
A-1090 Wien
Tel: +43 1 40160-21241
Fax: +43 1 40160-921200