[slurm-users] Using cgroups to hide GPUs on a shared controller/node

Nathan Harper nathan.harper at cfms.org.uk
Mon May 20 07:34:56 UTC 2019


This doesn't directly answer your question, but in Feb last year on the ML
there was a discussion about limiting user resources on login node
(Stopping compute usage on login nodes).    Some of the suggestions
included the use of cgroups to do so, and it's possible that those methods
could be extended to limit access to GPUs, so it might be worth looking
into.

On Sat, 18 May 2019 at 00:28, Dave Evans <rdevans at ece.ubc.ca> wrote:

>
> We are using a single system "cluster" and want some control of fair-use
> with the GPUs. The sers are not supposed to be able to use the GPUs until
> they have allocated the resources through slurm. We have no head node, so
> slurmctld, slurmdbd, and slurmd are all run on the same system.
>
> I have a configuration working now such that the GPUs can be scheduled and
> allocated.
> However logging into the system before allocating GPUs gives full access
> to all of them.
>
> I would like to configure slurm cgroups to disable access to GPUs until
> they have been allocated.
>
> On first login, I get:
> nvidia-smi -q | grep UUID
>     GPU UUID                        :
> GPU-6076ce0a-bc03-a53c-6616-0fc727801c27
>     GPU UUID                        :
> GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3
>     GPU UUID                        :
> GPU-176d0514-0cf0-df71-e298-72d15f6dcd7f
>     GPU UUID                        :
> GPU-af03c80f-6834-cb8c-3133-2f645975f330
>     GPU UUID                        :
> GPU-ef10d039-a432-1ac1-84cf-3bb79561c0d3
>     GPU UUID                        :
> GPU-38168510-c356-33c9-7189-4e74b5a1d333
>     GPU UUID                        :
> GPU-3428f78d-ae91-9a74-bcd6-8e301c108156
>     GPU UUID                        :
> GPU-c0a831c0-78d6-44ec-30dd-9ef5874059a5
>
>
> And running from the queue:
> srun -N 1 --gres=gpu:2 nvidia-smi -q | grep UUID
>     GPU UUID                        :
> GPU-6076ce0a-bc03-a53c-6616-0fc727801c27
>     GPU UUID                        :
> GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3
>
>
> Pastes of my config files are:
> ## slurm.conf ##
> https://pastebin.com/UxP67cA8
>
>
> *## cgroup.conf ##*
> CgroupAutomount=yes
> CgroupReleaseAgentDir="/etc/slurm/cgroup"
>
> ConstrainCores=yes
> ConstrainDevices=yes
> ConstrainRAMSpace=yes
> #TaskAffinity=yes
>
> *## cgroup_allowed_devices_file.conf ## *
> /dev/null
> /dev/urandom
> /dev/zero
> /dev/sda*
> /dev/cpu/*/*
> /dev/pts/*
> /dev/nvidia*
>


-- 
*Nathan Harper* // IT Systems Lead

*e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
*w: *www.cfms.org.uk
CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
Green // Bristol // BS16 7FR

CFMS Services Ltd is registered in England and Wales No 05742022 - a
subsidiary of CFMS Ltd
CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
4QP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190520/3df7639e/attachment.html>


More information about the slurm-users mailing list