[slurm-users] Parallel sbatch
Loris Bennett
loris.bennett at fu-berlin.de
Fri Nov 5 14:19:12 UTC 2021
Hi Marcus,
I would advise against putting a loop in your job script. The job array
mechanism is designed exactly for this purpose:
https://slurm.schedmd.com/job_array.html
If you have very small jobs, it is usually better to run them separately
so that the individual jobs can fill gaps in the schedule (assuming
backfill is being used). The exception to this is when the run-time of
the jobs is short relative to the length of time the jobs may have to
wait. In this case you might want to bundle number together and run
them sequentially within a single job (here a loop is appropriate).
Cheers,
Loris
Marcus Pedersén <marcus.pedersen at slu.se> writes:
> Hi all,
> I have setup a basic slurm system and been testing out
> a nuber of things.
> The latest thing I started to test is the parallel parts.
> What I have is about 70 independent scripts that would be
> ideal to run in parallel.
> For testing purposes I have created 20 dummy scripts
> that print script name, hostname sleeps for one minute
> and prints no of minutes.
>
> The way I want to run this is to allocate 2 nodes
> and run all of the 20 scripts in parallel, each one of them
> in one process.
> My idea is that the first node will be filled up with 12 processes,
> each process running one script and the second node will run
> the rest of the processes/scripts (8 scripts on 8 processes).
> I have read up on a couple of tutorials and looked at the documentation
> for different parts of slurm.
> But what ever flags I use for both sbatch and srun I do not seem to
> be able to accomplish what I want.
> All nodes have 6 cores with 2 threads.
>
> The closest I have come is with this small sbatch:
>
> #! /bin/bash
> #SBATCH --job-name=TestParallel
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=1
> #SBATCH --ntasks=2
> #SBATCH --cpus-per-task=12
> #SBATCH --nodelist=node1,node2
> #SBATCH --output="%x-%4j-%N.out"
> #SBATCH --mail-user=my at mail
> #SBATCH --mail-type=ALL
>
> echo
> date +%Y-%m-%d" "%H-%M-%S
>
> for i in {1..20}
> do
> srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1 --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log /path/to/test_prog$i.sh &
> done
>
> date +%Y-%m-%d" "%H-%M-%S
>
> wait
>
>
> sacct gives the following output:
> 505 TestParal+ all marcus 24 RUNNING node[1-2] 0:0
> 505.batch batch 12 RUNNING node1 0:0
> 505.0 Testp-3 1 RUNNING node1 0:0
> 505.1 Testp-6 1 RUNNING node2 0:0
> 505.2 Testp-2 1 RUNNING node1 0:0
> 505.3 Testp-13 1 RUNNING node1 0:0
> 505.4 Testp-9 1 RUNNING node1 0:0
> 505.5 Testp-11 1 RUNNING node1 0:0
> 505.6 Testp-16 1 RUNNING node1 0:0
> 505.7 Testp-12 1 RUNNING node1 0:0
> 505.8 Testp-20 1 RUNNING node1 0:0
> 505.9 Testp-4 1 RUNNING node1 0:0
> 505.10 Testp-19 1 RUNNING node1 0:0
> 505.11 Testp-10 1 RUNNING node1 0:0
> 505.12 Testp-5 1 RUNNING node1 0:0
>
>
> Slurm only use one process on node2 and of cause I want all the last 8 processes to run on node2.
>
> I have tried a number of other options usualy ending in running the same script multiple times
> and that is not what I want.
>
> I feel a bit stuck and can not get my head around this.
>
> I would really appreciate some help!!
>
> Many thanks in advance!!
>
> Best Regards
> Marcus
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list