[slurm-users] Parallel sbatch
Marcus Pedersén
marcus.pedersen at slu.se
Fri Nov 5 09:42:32 UTC 2021
Hi all,
I have setup a basic slurm system and been testing out
a nuber of things.
The latest thing I started to test is the parallel parts.
What I have is about 70 independent scripts that would be
ideal to run in parallel.
For testing purposes I have created 20 dummy scripts
that print script name, hostname sleeps for one minute
and prints no of minutes.
The way I want to run this is to allocate 2 nodes
and run all of the 20 scripts in parallel, each one of them
in one process.
My idea is that the first node will be filled up with 12 processes,
each process running one script and the second node will run
the rest of the processes/scripts (8 scripts on 8 processes).
I have read up on a couple of tutorials and looked at the documentation
for different parts of slurm.
But what ever flags I use for both sbatch and srun I do not seem to
be able to accomplish what I want.
All nodes have 6 cores with 2 threads.
The closest I have come is with this small sbatch:
#! /bin/bash
#SBATCH --job-name=TestParallel
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=12
#SBATCH --nodelist=node1,node2
#SBATCH --output="%x-%4j-%N.out"
#SBATCH --mail-user=my at mail
#SBATCH --mail-type=ALL
echo
date +%Y-%m-%d" "%H-%M-%S
for i in {1..20}
do
srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1 --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log /path/to/test_prog$i.sh &
done
date +%Y-%m-%d" "%H-%M-%S
wait
sacct gives the following output:
505 TestParal+ all marcus 24 RUNNING node[1-2] 0:0
505.batch batch 12 RUNNING node1 0:0
505.0 Testp-3 1 RUNNING node1 0:0
505.1 Testp-6 1 RUNNING node2 0:0
505.2 Testp-2 1 RUNNING node1 0:0
505.3 Testp-13 1 RUNNING node1 0:0
505.4 Testp-9 1 RUNNING node1 0:0
505.5 Testp-11 1 RUNNING node1 0:0
505.6 Testp-16 1 RUNNING node1 0:0
505.7 Testp-12 1 RUNNING node1 0:0
505.8 Testp-20 1 RUNNING node1 0:0
505.9 Testp-4 1 RUNNING node1 0:0
505.10 Testp-19 1 RUNNING node1 0:0
505.11 Testp-10 1 RUNNING node1 0:0
505.12 Testp-5 1 RUNNING node1 0:0
Slurm only use one process on node2 and of cause I want all the last 8 processes to run on node2.
I have tried a number of other options usualy ending in running the same script multiple times
and that is not what I want.
I feel a bit stuck and can not get my head around this.
I would really appreciate some help!!
Many thanks in advance!!
Best Regards
Marcus
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
More information about the slurm-users
mailing list