[slurm-users] Using cgroups to hide GPUs on a shared controller/node

Fri May 17 23:23:46 UTC 2019

We are using a single system "cluster" and want some control of fair-use
with the GPUs. The sers are not supposed to be able to use the GPUs until
they have allocated the resources through slurm. We have no head node, so
slurmctld, slurmdbd, and slurmd are all run on the same system.

I have a configuration working now such that the GPUs can be scheduled and
allocated.
However logging into the system before allocating GPUs gives full access to
all of them.

I would like to configure slurm cgroups to disable access to GPUs until
they have been allocated.

On first login, I get:
nvidia-smi -q | grep UUID
    GPU UUID                        :
GPU-6076ce0a-bc03-a53c-6616-0fc727801c27
    GPU UUID                        :
GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3
    GPU UUID                        :
GPU-176d0514-0cf0-df71-e298-72d15f6dcd7f
    GPU UUID                        :
GPU-af03c80f-6834-cb8c-3133-2f645975f330
    GPU UUID                        :
GPU-ef10d039-a432-1ac1-84cf-3bb79561c0d3
    GPU UUID                        :
GPU-38168510-c356-33c9-7189-4e74b5a1d333
    GPU UUID                        :
GPU-3428f78d-ae91-9a74-bcd6-8e301c108156
    GPU UUID                        :
GPU-c0a831c0-78d6-44ec-30dd-9ef5874059a5

And running from the queue:
srun -N 1 --gres=gpu:2 nvidia-smi -q | grep UUID
    GPU UUID                        :
GPU-6076ce0a-bc03-a53c-6616-0fc727801c27
    GPU UUID                        :
GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3

Pastes of my config files are:
## slurm.conf ##
https://pastebin.com/UxP67cA8

*## cgroup.conf ##*
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"

ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
#TaskAffinity=yes

*## cgroup_allowed_devices_file.conf ## *
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/dev/nvidia*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190517/84ca974c/attachment.html>