Hi,

I'm using slurm on a small 8 nodes cluster. I've recently added one GPU node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.

As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as I can reach a GPU without reserving it:

srun -n 1 -p tenibre-gpu ./a.out

can use a GPU even if the reservation do not specify this resource (checked with running nvidia-smi on the node). "tenibre-gpu" is a slurm partition with only this gpu node.

From the documentation I've created a gres.conf file and it is propagated on all the nodes (9 compute nodes, 1 login node and the management node) and slurmd has been restarted.

gres.conf is:*

## GPU setup on tenibre-gpu-0
NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0 Flags=nvidia_gpu_env
NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1 Flags=nvidia_gpu_env

In slurm.conf I have checked these flags:

## Basic scheduling
SelectTypeParameters=CR_Core_Memory
SchedulerType=sched/backfill
SelectType=select/cons_tres

## Generic resources
GresTypes=gpu

## Nodes list
....
Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
....

#partitions
PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00 DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES State=UP Nodes=tenibre-gpu-0
...

May be I've missed something ? I'm running Slurm 20.11.7-1.

Thanks for your advices.

Patrick