[slurm-users] Re : Two jobs ends up on one GPU?
Magnus Jonsson
magnus at hpc2n.umu.se
Tue Jan 15 12:51:22 UTC 2019
CUDA_VISIBLE_DEVICES=0 on both jobs.
This is also true for the any other jobs that work just fine.
There are only one device visible in the job.
/Magnus
On 2019-01-15 13:28, arnaud.renard at univ-reims.fr wrote:
> What is the value of env. variable
> CUDA_VISIBLE_DEVICES
>
> ?
>
> Envoyé depuis mon mobile Huawei
>
>
> -------- Message original --------
> Objet : [slurm-users] Two jobs ends up on one GPU?
> De : Magnus Jonsson
> À : slurm-users at lists.schedmd.com
> Cc :
>
>
> Hi!
>
> We have machines with multiple GPUs (Nvidia V100).
> We allow multiple (two) jobs on the nodes.
>
> We have a user that in some way have managed to get both jobs to end up
> on the same GPU (verified via nvidia-smi).
>
> We are using cgroups and the nvidia-smi command only shows one of the
> GPUs (if only one GPU are requested) and only the defined /dev/nvidia?
> device are accessable.
>
> We are unable to reproduce this. Have anybody seen anything like this?
>
> /Magnus
>
> --
> Magnus Jonsson, Developer, HPC2N, Umeå Universitet
>
--
Magnus Jonsson, Developer, HPC2N, Umeå Universitet
More information about the slurm-users
mailing list