[slurm-users] Building Slurm RPMs with NVIDIA GPU support?
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Jan 26 20:40:10 UTC 2021
On 26-01-2021 21:11, Paul Raines wrote:
> You should check your jobs that allocated GPUs and make sure
> CUDA_VISIBLE_DEVICES is being set in the environment. This is a sign
> you GPU support is not really there but SLURM is just doing "generic"
> resource assignment.
Could you elaborate a bit on this remark? Are you saying that I need to
check if CUDA_VISIBLE_DEVICES is defined automatically by Slurm inside
the batch job as described in https://slurm.schedmd.com/gres.html?
What do you mean by "your GPU support is not really there" and Slurm
doing "generic" resource assignment? I'm just not understanding this...
With my Slurm 20.02.6 built without NVIDIA libraries, Slurm nevertheless
seems to be scheduling multiple jobs so that different jobs are assigned
to different GPUs. The GRES=gpu* values point to distinct IDX values
(GPU indexes). The nvidia-smi command shows individual processes
running on distinct GPUs. All seems to be fine - or am I completely
More information about the slurm-users