[slurm-users] [EXT] GPU Jobs with Slurm

Sean Crosby scrosby at unimelb.edu.au
Thu Jan 14 08:19:53 UTC 2021


Hi Abhiram,

You need to configure cgroup.conf to constrain the devices a job has access
to. See https://slurm.schedmd.com/cgroup.conf.html

My cgroup.conf is

CgroupAutomount=yes
AllowedDevicesFile="/usr/local/slurm/etc/cgroup_allowed_devices_file.conf"

ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes

TaskAffinity=no

CgroupMountpoint=/sys/fs/cgroup

The ConstrainDevices=yes is the key to stopping jobs from having access to
GPUs they didn't request.

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Thu, 14 Jan 2021 at 18:36, Abhiram Chintangal <achintangal at berkeley.edu>
wrote:

> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
> Hello,
>
> I recently set up a small cluster at work using Warewulf/Slurm. Currently,
> I am not able to get the scheduler to
> work well with GPU's (Gres).
>
> While slurm is able to filter by GPU type, it allocates all the GPU's on
> the node. See below:
>
> [abhiram at whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi
>> --query-gpu=index,name --format=csv
>> index, name
>> 0, Tesla P100-PCIE-16GB
>> 1, Tesla P100-PCIE-16GB
>> 2, Tesla P100-PCIE-16GB
>> 3, Tesla P100-PCIE-16GB
>> [abhiram at whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu
>> nvidia-smi --query-gpu=index,name --format=csv
>> index, name
>> 0, TITAN RTX
>> 1, TITAN RTX
>> 2, TITAN RTX
>> 3, TITAN RTX
>> 4, TITAN RTX
>> 5, TITAN RTX
>> 6, TITAN RTX
>> 7, TITAN RTX
>>
>
> I am fairly new to Slurm and still figuring out my way around it. I would
> really appreciate any help with this.
>
> For your reference, I attached the slurm.conf and gres.conf files.
>
> Best,
>
> Abhiram
>
> --
>
> Abhiram Chintangal
> QB3 Nogales Lab
> Bioinformatics Specialist @ Howard Hughes Medical Institute
> University of California Berkeley
> 708D Stanley Hall, Berkeley, CA 94720
> Phone (510)666-3344
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210114/c360bef5/attachment-0001.htm>


More information about the slurm-users mailing list