[slurm-users] multiple srun commands in the same SLURM script
Andrei Berceanu
andreicberceanu at gmail.com
Tue Oct 31 10:50:57 UTC 2023
Here is my SLURM script:
#!/bin/bash
#SBATCH --job-name="gpu_test"
#SBATCH --output=gpu_test_%j.log # Standard output and error log
#SBATCH --account=berceanu_a+
#SBATCH --partition=gpu
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=31200m # Reserve 32 GB of RAM per core
#SBATCH --time=12:00:00 # Max allowed job runtime
#SBATCH --gres=gpu:16 # Allocate four GPUs
export SLURM_EXACT=1
srun --mpi=pmi2 -n 1 --gpus-per-node 1 python gpu_test.py &
srun --mpi=pmi2 -n 1 --gpus-per-node 1 python gpu_test.py &
srun --mpi=pmi2 -n 1 --gpus-per-node 1 python gpu_test.py &
srun --mpi=pmi2 -n 1 --gpus-per-node 1 python gpu_test.py &
wait
What I expect this to do is to run, in parallel, 4 independent copies
of the gpu_test.py python script, using 4 out of the 16 GPUs on this
node.
What it actually does is it only runs the script on a single GPU -
it's as if the other 3 srun commands do nothing. Perhaps they do not
see any available GPUs for some reason?
System info:
slurm 19.05.2
Linux 5.4.0-90-generic #101~18.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
gpu up infinite 1 idle thor
NodeName=thor Arch=x86_64 CoresPerSocket=24
CPUAlloc=0 CPUTot=48 CPULoad=0.45
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:16(S:0-1)
NodeAddr=thor NodeHostName=thor
OS=Linux 5.4.0-90-generic #101~18.04.1-Ubuntu SMP Fri Oct 22
09:25:04 UTC 2021
RealMemory=1546812 AllocMem=0 FreeMem=1433049 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=gpu
BootTime=2023-08-09T14:58:01 SlurmdStartTime=2023-08-09T14:58:36
CfgTRES=cpu=48,mem=1546812M,billing=48,gres/gpu=16
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
I can add any additional system info as required.
Thank you so much for taking the time to read this,
Regards,
Andrei
More information about the slurm-users
mailing list