[slurm-users] Parallel sbatch
Sean McGrath
smcgrat at tchpc.tcd.ie
Fri Nov 5 10:11:06 UTC 2021
Hi Marcus,
Is something like staskfarm, https://github.com/paddydoyle/staskfarm,
https://www.tchpc.tcd.ie/node/1127 any use for your needs? Sorry if not.
Regards
Sean
On Fri, Nov 05, 2021 at 10:42:32AM +0100, Marcus Peders?n wrote:
> Hi all,
> I have setup a basic slurm system and been testing out
> a nuber of things.
> The latest thing I started to test is the parallel parts.
> What I have is about 70 independent scripts that would be
> ideal to run in parallel.
> For testing purposes I have created 20 dummy scripts
> that print script name, hostname sleeps for one minute
> and prints no of minutes.
>
> The way I want to run this is to allocate 2 nodes
> and run all of the 20 scripts in parallel, each one of them
> in one process.
> My idea is that the first node will be filled up with 12 processes,
> each process running one script and the second node will run
> the rest of the processes/scripts (8 scripts on 8 processes).
> I have read up on a couple of tutorials and looked at the documentation
> for different parts of slurm.
> But what ever flags I use for both sbatch and srun I do not seem to
> be able to accomplish what I want.
> All nodes have 6 cores with 2 threads.
>
> The closest I have come is with this small sbatch:
>
> #! /bin/bash
> #SBATCH --job-name=TestParallel
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=1
> #SBATCH --ntasks=2
> #SBATCH --cpus-per-task=12
> #SBATCH --nodelist=node1,node2
> #SBATCH --output="%x-%4j-%N.out"
> #SBATCH --mail-user=my at mail
> #SBATCH --mail-type=ALL
>
> echo
> date +%Y-%m-%d" "%H-%M-%S
>
> for i in {1..20}
> do
> srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1 --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log /path/to/test_prog$i.sh &
> done
>
> date +%Y-%m-%d" "%H-%M-%S
>
> wait
>
>
> sacct gives the following output:
> 505 TestParal+ all marcus 24 RUNNING node[1-2] 0:0
> 505.batch batch 12 RUNNING node1 0:0
> 505.0 Testp-3 1 RUNNING node1 0:0
> 505.1 Testp-6 1 RUNNING node2 0:0
> 505.2 Testp-2 1 RUNNING node1 0:0
> 505.3 Testp-13 1 RUNNING node1 0:0
> 505.4 Testp-9 1 RUNNING node1 0:0
> 505.5 Testp-11 1 RUNNING node1 0:0
> 505.6 Testp-16 1 RUNNING node1 0:0
> 505.7 Testp-12 1 RUNNING node1 0:0
> 505.8 Testp-20 1 RUNNING node1 0:0
> 505.9 Testp-4 1 RUNNING node1 0:0
> 505.10 Testp-19 1 RUNNING node1 0:0
> 505.11 Testp-10 1 RUNNING node1 0:0
> 505.12 Testp-5 1 RUNNING node1 0:0
>
>
> Slurm only use one process on node2 and of cause I want all the last 8 processes to run on node2.
>
> I have tried a number of other options usualy ending in running the same script multiple times
> and that is not what I want.
>
> I feel a bit stuck and can not get my head around this.
>
> I would really appreciate some help!!
>
> Many thanks in advance!!
>
> Best Regards
> Marcus
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
--
Sean McGrath M.Sc
Systems Administrator
Trinity Centre for High Performance and Research Computing
Trinity College Dublin
sean.mcgrath at tchpc.tcd.ie
https://www.tcd.ie/
https://www.tchpc.tcd.ie/
+353 (0) 1 896 3725
More information about the slurm-users
mailing list