[slurm-users] GPU / cgroup challenges
Chris Samuel
chris at csamuel.org
Sat May 5 07:04:48 MDT 2018
On Wednesday, 2 May 2018 11:04:34 PM AEST R. Paul Wiegand wrote:
> When I set "--gres=gpu:1", the slurmd log does have encouraging lines such
> as:
>
> [2018-05-02T08:47:04.916] [203.0] debug: Allowing access to device
> /dev/nvidia0 for job
> [2018-05-02T08:47:04.916] [203.0] debug: Not allowing access to
> device /dev/nvidia1 for job
>
> However, I can still "see" both devices from nvidia-smi, and I can
> still access both if I manually unset CUDA_VISIBLE_DEVICES.
The only thing I can think of is a bug that's been fixed since 17.11.0 (as I
know it works for us with 17.11.5) or a kernel bug (or missing device
cgroups).
Sorry I can't be more helpful!
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list