<div dir="ltr">
<div>Hi,</div><div><br></div><div>I recently set up slurm for the first
time on our small cluster and got everything working well except for one
issue. When requesting jobs with GPU and CPU, requesting 1 GPU+1CPU is
allocated correctly among the nodes but requesting 1GPU+2CPUs is not
allocated correctly. I'm not sure exactly what's causing the issue and
was hoping someone might have some suggestions.<br></div><div><br></div><div>Slurm version: 22.05.3</div><div>OS: RedHat 7.9 (head node), and RedHat 7.4 (compute nodes)</div><div>Hardware config: 1 head node, 5 compute nodes each with 2 GPUs and 8 CPUs<br></div><div><br></div><div>
Some example scenarios to explain the problem:<br>Submitting a job requesting 1 CPU and 1 GPU works fine:<br>#!/bin/bash<br>#SBATCH --nodes=1<br>#SBATCH --ntasks=1<br>#SBATCH --mem=4GB<br>#SBATCH --cpus-per-task=1<br>#SBATCH --gpus=1<br><br>- Job A requests 1 CPU, 1GPU and 4GB memory -> assigned to node1<br>- Job B requests 1 CPU, 1GPU and 4GB memory -> assigned to node1<br>- Job C requests 1 CPU, 1GPU and 4GB memory -> assigned to node2 as there's only 2 GPUs per node<br><br>Submitting a job
requesting
2 CPUs and 1 GPU causes issues:<br>#SBATCH --cpus-per-task=2<br><br>- Job A requests 2 CPUs, 1GPU and 4GB memory -> assigned to node1<br>- Job B requests 2 CPUs, 1GPU and 4GB memory -> assigned to node2 even though node1 should still have resources available</div><div><br></div><div>Including what might be relevant info from slurm.conf below in case it's helpful:</div><div>DefMemPerCPU=2048<br>SchedulerType=sched/backfill<br>SelectType=select/cons_tres<br>SelectTypeParameters=CR_CPU_Memory</div><div>DefCpuPerGPU=1</div><div>GresTypes=gpu<br>NodeName=
computenodes
[1-5] NodeAddr=
computenodes[1-5] CPUs=8 RealMemory=64189 Gres=gpu:2 State=UNKNOWN<br>PartitionName=batch Nodes=ALL Default=YES MaxTime=INFINITE State=UP</div><div><br></div><div>Appreciate any suggestions/ideas!</div><div><br></div><div>Thanks,</div><div>Rohith</div>
</div>