[slurm-users] First setup of slurm with a GPU node

13 Nov 2024


      Hi,
I'm using slurm on a small 8 nodes cluster. I've recently added one GPU 
node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.
As using this GPU resource increase I would like to manage this resource 
with Gres to avoid usage conflict. But at this time my setup do not 
works as I can reach a GPU without reserving it:
srun -n 1 -p tenibre-gpu ./a.out
can use a GPU even if the reservation do not specify this resource 
(checked with running nvidia-smi  on the node). "tenibre-gpu" is a slurm 
partition with only this gpu node.
From the documentation I've created a gres.conf file and it is 
propagated on all the nodes (9 compute nodes, 1 login node and the 
management node) and slurmd has been restarted.
gres.conf is:*
## GPU setup on tenibre-gpu-0
    NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0
    Flags=nvidia_gpu_env
    NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1
    Flags=nvidia_gpu_env
    *
    *
In slurm.conf I have checked these flags:
## Basic scheduling
    SelectTypeParameters=CR_Core_Memory
    SchedulerType=sched/backfill
    SelectType=select/cons_tres
## Generic resources
    GresTypes=gpu
## Nodes list
    ....
    Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16
    ThreadsPerCore=1 State=UNKNOWN
    ....
#partitions
    PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00
    DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES  State=UP
    Nodes=tenibre-gpu-0
    ...
May be I've missed something ?  I'm running Slurm 20.11.7-1.
Thanks for your advices.
Patrick

2025

2024

[slurm-users] First setup of slurm with a GPU node