[slurm-users] Parallel sbatch

Fri Nov 5 09:42:32 UTC 2021

Hi all,
I have setup a basic slurm system and been testing out
a nuber of things.
The latest thing I started to test is the parallel parts.
What I have is about 70 independent scripts that would be
ideal to run in parallel.
For testing purposes I have created 20 dummy scripts
that print script name, hostname sleeps for one minute
and prints no of minutes.

The way I want to run this is to allocate 2 nodes
and run all of the 20 scripts in parallel, each one of them
in one process.
My idea is that the first node will be filled up with 12 processes,
each process running one script and the second node will run
the rest of the processes/scripts (8 scripts on 8 processes).
I have read up on a couple of tutorials and looked at the documentation
for different parts of slurm.
But what ever flags I use for both sbatch and srun I do not seem to
be able to accomplish what I want.
All nodes have 6 cores with 2 threads.

The closest I have come is with this small sbatch:

#! /bin/bash
#SBATCH --job-name=TestParallel
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=12
#SBATCH --nodelist=node1,node2
#SBATCH --output="%x-%4j-%N.out"
#SBATCH --mail-type=ALL
#SBATCH --mail-type=ALL

date +%Y-%m-%d"     "%H-%M-%S

for i in {1..20}
      srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1 --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log /path/to/test_prog$i.sh &

date +%Y-%m-%d"     "%H-%M-%S


sacct gives the following output:
505          TestParal+        all    marcus         24    RUNNING     node[1-2]        0:0
505.batch         batch                              12    RUNNING     node1            0:0
505.0           Testp-3                               1    RUNNING     node1            0:0
505.1           Testp-6                               1    RUNNING     node2            0:0
505.2           Testp-2                               1    RUNNING     node1            0:0
505.3          Testp-13                               1    RUNNING     node1            0:0
505.4           Testp-9                               1    RUNNING     node1            0:0
505.5          Testp-11                               1    RUNNING     node1            0:0
505.6          Testp-16                               1    RUNNING     node1            0:0
505.7          Testp-12                               1    RUNNING     node1            0:0
505.8          Testp-20                               1    RUNNING     node1            0:0
505.9           Testp-4                               1    RUNNING     node1            0:0
505.10         Testp-19                               1    RUNNING     node1            0:0
505.11         Testp-10                               1    RUNNING     node1            0:0
505.12          Testp-5                               1    RUNNING     node1            0:0

Slurm only use one process on node2 and of cause I want all the last 8 processes to run on node2.

I have tried a number of other options usualy ending in running the same script multiple times
and that is not what I want.

I feel a bit stuck and can not get my head around this.

I would really appreciate some help!!

Many thanks in advance!!

Best Regards

