[slurm-users] Re : Two jobs ends up on one GPU?

Magnus Jonsson magnus at hpc2n.umu.se
Tue Jan 15 12:51:22 UTC 2019


CUDA_VISIBLE_DEVICES=0 on both jobs.

This is also true for the any other jobs that work just fine.

There are only one device visible in the job.

/Magnus


On 2019-01-15 13:28, arnaud.renard at univ-reims.fr wrote:
> What is the value of env. variable
> CUDA_VISIBLE_DEVICES
> 
> ?
> 
> Envoyé depuis mon mobile Huawei
> 
> 
> -------- Message original --------
> Objet : [slurm-users] Two jobs ends up on one GPU?
> De : Magnus Jonsson
> À : slurm-users at lists.schedmd.com
> Cc :
> 
> 
>     Hi!
> 
>     We have machines with multiple GPUs (Nvidia V100).
>     We allow multiple (two) jobs on the nodes.
> 
>     We have a user that in some way have managed to get both jobs to end up
>     on the same GPU (verified via nvidia-smi).
> 
>     We are using cgroups and the nvidia-smi command only shows one of the
>     GPUs (if only one GPU are requested) and only the defined /dev/nvidia?
>     device are accessable.
> 
>     We are unable to reproduce this. Have anybody seen anything like this?
> 
>     /Magnus
> 
>     -- 
>     Magnus Jonsson, Developer, HPC2N, Umeå Universitet
> 

-- 
Magnus Jonsson, Developer, HPC2N, Umeå Universitet



More information about the slurm-users mailing list