[slurm-users] GRES Restrictions
Willy Markuske
wmarkuske at sdsc.edu
Tue Aug 25 16:15:43 UTC 2020
Thanks Christoph and others for the help.
Turns out it is very simply setting cgroups that I had most of the way
set months ago and even left myself a note to uncomment
ConstrainDevices=yes in cgroup.conf when the GPU systems came online.
Kept racking my brain why the gres settings didn't include anything
while it would set the number of requested GPUs correctly.
Everything is working as expected now.
Willy Markuske
HPC Systems Engineer
Research Data Services
P: (858) 246-5593
On 8/25/20 8:24 AM, Christoph Brüning wrote:
> Hello,
>
> we're using cgroups to restrict access to the GPUs.
>
> What I found particularly helpful, are the slides by Marshall Garey
> from last year's Slurm User Group Meeting:
> https://urldefense.com/v3/__https://slurm.schedmd.com/SLUG19/cgroups_and_pam_slurm_adopt.pdf__;!!Mih3wA!XNe605WUGPer00S7oSxp5Vkj06UAdkDNiE-hhGSr9HvCBjneYA_8p1C12xnCD17p$
> (NVML didn't work for us for some reason I cannot recall, but listing
> the GPU device files explicitly was not a big deal)
>
> Best,
> Christoph
>
>
> On 25/08/2020 16.12, Willy Markuske wrote:
>> Hello,
>>
>> I'm trying to restrict access to gpu resources on a cluster I
>> maintain for a research group. There are two nodes put into a
>> partition with gres gpu resources defined. User can access these
>> resources by submitting their job under the gpu partition and
>> defining a gres=gpu.
>>
>> When a user includes the flag --gres=gpu:# they are allocated the
>> number of gpus and slurm properly allocates them. If a user requests
>> only 1 gpu they only see CUDA_VISIBLE_DEVICES=1. However, if a user
>> does not include the --gres=gpu:# flag they can still submit a job to
>> the partition and are then able to see all the GPUs. This has led to
>> some bad actors running jobs on all GPUs that other users have
>> allocated and causing OOM errors on the gpus.
>>
>> Is it possible, and where would I find the documentation on doing so,
>> to require users to define a --gres=gpu:# to be able to submit to a
>> partition? So far reading the gres documentation doesn't seem to have
>> yielded any word on this issue specifically.
>>
>> Regards,
>>
>> --
>>
>> Willy Markuske
>>
>> HPC Systems Engineer
>>
>>
>>
>> Research Data Services
>>
>> P: (858) 246-5593
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200825/600c7a7b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SDSClogo-plusname-red.jpg
Type: image/jpeg
Size: 9464 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200825/600c7a7b/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200825/600c7a7b/attachment-0001.sig>
More information about the slurm-users
mailing list