[slurm-users] Parallel sbatch

Sean McGrath smcgrat at tchpc.tcd.ie
Fri Nov 5 10:11:06 UTC 2021


Hi Marcus,

Is something like staskfarm, https://github.com/paddydoyle/staskfarm,
https://www.tchpc.tcd.ie/node/1127 any use for your needs? Sorry if not.

Regards

Sean


On Fri, Nov 05, 2021 at 10:42:32AM +0100, Marcus Peders?n wrote:

> Hi all,
> I have setup a basic slurm system and been testing out
> a nuber of things.
> The latest thing I started to test is the parallel parts.
> What I have is about 70 independent scripts that would be
> ideal to run in parallel.
> For testing purposes I have created 20 dummy scripts
> that print script name, hostname sleeps for one minute
> and prints no of minutes.
> 
> The way I want to run this is to allocate 2 nodes
> and run all of the 20 scripts in parallel, each one of them
> in one process.
> My idea is that the first node will be filled up with 12 processes,
> each process running one script and the second node will run
> the rest of the processes/scripts (8 scripts on 8 processes).
> I have read up on a couple of tutorials and looked at the documentation
> for different parts of slurm.
> But what ever flags I use for both sbatch and srun I do not seem to
> be able to accomplish what I want.
> All nodes have 6 cores with 2 threads.
> 
> The closest I have come is with this small sbatch:
> 
> #! /bin/bash
> #SBATCH --job-name=TestParallel
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=1
> #SBATCH --ntasks=2
> #SBATCH --cpus-per-task=12
> #SBATCH --nodelist=node1,node2
> #SBATCH --output="%x-%4j-%N.out"
> #SBATCH --mail-user=my at mail
> #SBATCH --mail-type=ALL
> 
> echo
> date +%Y-%m-%d"     "%H-%M-%S
> 
> for i in {1..20}
>   do
>       srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1 --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log /path/to/test_prog$i.sh &
> done
> 
> date +%Y-%m-%d"     "%H-%M-%S
> 
> wait
> 
> 
> sacct gives the following output:
> 505          TestParal+        all    marcus         24    RUNNING     node[1-2]        0:0
> 505.batch         batch                              12    RUNNING     node1            0:0
> 505.0           Testp-3                               1    RUNNING     node1            0:0
> 505.1           Testp-6                               1    RUNNING     node2            0:0
> 505.2           Testp-2                               1    RUNNING     node1            0:0
> 505.3          Testp-13                               1    RUNNING     node1            0:0
> 505.4           Testp-9                               1    RUNNING     node1            0:0
> 505.5          Testp-11                               1    RUNNING     node1            0:0
> 505.6          Testp-16                               1    RUNNING     node1            0:0
> 505.7          Testp-12                               1    RUNNING     node1            0:0
> 505.8          Testp-20                               1    RUNNING     node1            0:0
> 505.9           Testp-4                               1    RUNNING     node1            0:0
> 505.10         Testp-19                               1    RUNNING     node1            0:0
> 505.11         Testp-10                               1    RUNNING     node1            0:0
> 505.12          Testp-5                               1    RUNNING     node1            0:0
> 
> 
> Slurm only use one process on node2 and of cause I want all the last 8 processes to run on node2.
> 
> I have tried a number of other options usualy ending in running the same script multiple times
> and that is not what I want.
> 
> I feel a bit stuck and can not get my head around this.
> 
> I would really appreciate some help!!
> 
> Many thanks in advance!!
> 
> Best Regards
> Marcus
> 
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> 

-- 
Sean McGrath M.Sc

Systems Administrator
Trinity Centre for High Performance and Research Computing
Trinity College Dublin

sean.mcgrath at tchpc.tcd.ie

https://www.tcd.ie/
https://www.tchpc.tcd.ie/

+353 (0) 1 896 3725




More information about the slurm-users mailing list