[slurm-users] Defining new Gres types on nodes

Eli V eliventer at gmail.com
Mon Sep 24 13:53:08 MDT 2018


On Mon, Sep 24, 2018 at 12:27 PM Will Dennis <wdennis at nec-labs.com> wrote:
>
> Hi all,
>
> We want to add in some Gres resource types pertaining to GPUs (amount of GPU memory and CUDA cores) on some of our nodes. So we added the following params into the 'gres.conf' on the nodes that have GPUs:
>
> Name=gpu_mem Count=<#>G
> Name=gpu_cores Count=<#>

I just have a single gres.conf that's copied to all nodes, same as
slurm.conf. It lists NodeName=x Count=y Name=w for each node & gres.

> And in slurm.conf:
>
> GresTypes=gpu,gpu_mem,gpu_cores
>
> And down in the NodeName lines for these servers:
>
> Gres=gpu:<#>,gpu_mem:no_consume:<#>G,gpu_cores:no_consume:<#>

I'm not using the :no_consume syntax, simply Gres=name:#,y:z,...
Of course after changes copy gres & slurm.conf to all nodes and
scontrol reconfigure works great for me.

> (where <#> of course is the relevant numerical value)
>
> However, upon restarting the slurmctld on the controller, and the slurmd on the clients, the nodes appear to be unhappy with this, giving a message such as:
>
> Reason=gres/gpu_mem count too low (0 < 4294967296) [root at 2018-09-24T11:36:01]
>
> And of course are then going into DRAIN mode.
>
> We are running Slurm v16.04.5, is doing something like the above a possibility on this version? If so, what could be the problem?
>
>



More information about the slurm-users mailing list