[slurm-users] GPU / cgroup challenges

Chris Samuel chris at csamuel.org
Sat May 5 07:04:48 MDT 2018


On Wednesday, 2 May 2018 11:04:34 PM AEST R. Paul Wiegand wrote:

> When I set "--gres=gpu:1", the slurmd log does have encouraging lines such
> as:
> 
> [2018-05-02T08:47:04.916] [203.0] debug:  Allowing access to device
> /dev/nvidia0 for job
> [2018-05-02T08:47:04.916] [203.0] debug:  Not allowing access to
> device /dev/nvidia1 for job
> 
> However, I can still "see" both devices from nvidia-smi, and I can
> still access both if I manually unset CUDA_VISIBLE_DEVICES.

The only thing I can think of is a bug that's been fixed since 17.11.0 (as I 
know it works for us with 17.11.5) or a kernel bug (or missing device 
cgroups).

Sorry I can't be more helpful!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




More information about the slurm-users mailing list