[slurm-users] question about hyperthreaded CPUS, --hint=nomultithread and mutli-jobstep jobs
Hans van Schoot
vanschoot at scm.com
Tue May 23 09:20:18 UTC 2023
Hi all,
I am getting some unexpected behavior with SLURM on a multithreaded CPU
(AMD Ryzen 7950X), in combination with a job that uses multiple jobsteps
and a program that prefers to run without hyperthreading.
My job consists of a simple shell script that does multiple srun
executions, and normally (on non-multithreaded nodes) the srun commands
will only start when resources are available inside my allocation. Example:
sbatch -N 1 -n 16 mytestjob.sh
mytestjob.sh contains:
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
wait
srun 1 and 2 will start immediately, srun 3 will start as soon as one of
the first two jobsteps is finished, and srun 4 will again wait until
some cores are available.
Now I would like this same behavior (no multithreading, one task per
core) on a node with 16 multithreaded cores (32 cpus in SLURM
ThreadsPerCore=2), so I submit with the following command:
sbatch -N 1 --hint=nomultithread -n 16 mytestjob.sh
Slurm correctly reserves the whole node for this, and srun without
additional directions would launch someMPIprog with 16 MPI ranks.
Unfortunately in the multi jobstep situation, this causes all four srun
iterations to start immediately, resulting in 4x 8 MPI ranks running at
the same time, and thus multithreading. As I specified
--hint=nomultithread, I would have expected the same behaviour as on the
no-multithreaded node: srun 1 and 2 launch directly, and srun 3 and 4
wait for CPU resources to become available.
So far I've been able to find two hacky ways of getting around this problem:
- do not use --hint=nomultithread, and instead limit using memory
(--mem-per-cpu=4000). This is a bit ugly: it reserves half the compute
node and seems to bind to the wrong CPU cores.
- set --cpus-per-task=2 instead of --hint=nomultithread, but this causes
OpenMP to kick in if the MPI program supports it.
To me this feels like a bit of a bug in SLURM: I tell it not to
multithread, but it still schedules jobsteps that cause the CPU to
multithread
Is there another way of getting the non-multithreaded behavior without
disabling multithreading in BIOS?
Best regards and many thanks in advance!
Hans van Schoot
Some additional information:
- I'm running slurm 18.08.4
- This is my node configuration in scontrol:
scontrol show nodes compute-7-0
NodeName=compute-7-0 Arch=x86_64 CoresPerSocket=16
CPUAlloc=0 CPUTot=32 CPULoad=1.00
AvailableFeatures=rack-7,32CPUs
ActiveFeatures=rack-7,32CPUs
Gres=(null)
NodeAddr=10.1.1.210 NodeHostName=compute-7-0 Version=18.08
OS=Linux 6.1.8-1.el7.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jan 23
12:57:27 EST 2023
RealMemory=64051 AllocMem=0 FreeMem=62681 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=20511200 Owner=N/A
MCS_label=N/A
Partitions=zen4
BootTime=2023-04-04T14:14:55 SlurmdStartTime=2023-04-17T12:32:38
CfgTRES=cpu=32,mem=64051M,billing=47
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
More information about the slurm-users
mailing list