[slurm-users] GRES Restrictions
Christoph Brüning
christoph.bruening at uni-wuerzburg.de
Tue Aug 25 15:24:41 UTC 2020
Hello,
we're using cgroups to restrict access to the GPUs.
What I found particularly helpful, are the slides by Marshall Garey from
last year's Slurm User Group Meeting:
https://slurm.schedmd.com/SLUG19/cgroups_and_pam_slurm_adopt.pdf
(NVML didn't work for us for some reason I cannot recall, but listing
the GPU device files explicitly was not a big deal)
Best,
Christoph
On 25/08/2020 16.12, Willy Markuske wrote:
> Hello,
>
> I'm trying to restrict access to gpu resources on a cluster I maintain
> for a research group. There are two nodes put into a partition with gres
> gpu resources defined. User can access these resources by submitting
> their job under the gpu partition and defining a gres=gpu.
>
> When a user includes the flag --gres=gpu:# they are allocated the number
> of gpus and slurm properly allocates them. If a user requests only 1 gpu
> they only see CUDA_VISIBLE_DEVICES=1. However, if a user does not
> include the --gres=gpu:# flag they can still submit a job to the
> partition and are then able to see all the GPUs. This has led to some
> bad actors running jobs on all GPUs that other users have allocated and
> causing OOM errors on the gpus.
>
> Is it possible, and where would I find the documentation on doing so, to
> require users to define a --gres=gpu:# to be able to submit to a
> partition? So far reading the gres documentation doesn't seem to have
> yielded any word on this issue specifically.
>
> Regards,
>
> --
>
> Willy Markuske
>
> HPC Systems Engineer
>
>
>
> Research Data Services
>
> P: (858) 246-5593
>
--
Dr. Christoph Brüning
Universität Würzburg
Rechenzentrum
Am Hubland
D-97074 Würzburg
Tel.: +49 931 31-80499
More information about the slurm-users
mailing list