[slurm-users] Defining new Gres types on nodes

Will Dennis wdennis at nec-labs.com
Mon Sep 24 10:25:05 MDT 2018


Hi all,

We want to add in some Gres resource types pertaining to GPUs (amount of GPU memory and CUDA cores) on some of our nodes. So we added the following params into the 'gres.conf' on the nodes that have GPUs:

Name=gpu_mem Count=<#>G 
Name=gpu_cores Count=<#>

And in slurm.conf:

GresTypes=gpu,gpu_mem,gpu_cores

And down in the NodeName lines for these servers:

Gres=gpu:<#>,gpu_mem:no_consume:<#>G,gpu_cores:no_consume:<#>

(where <#> of course is the relevant numerical value)

However, upon restarting the slurmctld on the controller, and the slurmd on the clients, the nodes appear to be unhappy with this, giving a message such as:

Reason=gres/gpu_mem count too low (0 < 4294967296) [root at 2018-09-24T11:36:01]

And of course are then going into DRAIN mode.

We are running Slurm v16.04.5, is doing something like the above a possibility on this version? If so, what could be the problem?




More information about the slurm-users mailing list