[slurm-users] Parallel sbatch

Richard Lefebvre richard.lefebvre at calculquebec.ca
Fri Nov 5 12:55:17 UTC 2021


I would suggest using Gnu Parallel (https://www.gnu.org/software/parallel/).
Also, if you run that many "srun" in a row, on a very large cluster where
the slurmctl is very solicited some of the srun might time out and not run.

Richard

Le ven. 5 nov. 2021 à 05:45, Marcus Pedersén <marcus.pedersen at slu.se> a
écrit :

> Hi all,
> I have setup a basic slurm system and been testing out
> a nuber of things.
> The latest thing I started to test is the parallel parts.
> What I have is about 70 independent scripts that would be
> ideal to run in parallel.
> For testing purposes I have created 20 dummy scripts
> that print script name, hostname sleeps for one minute
> and prints no of minutes.
>
> The way I want to run this is to allocate 2 nodes
> and run all of the 20 scripts in parallel, each one of them
> in one process.
> My idea is that the first node will be filled up with 12 processes,
> each process running one script and the second node will run
> the rest of the processes/scripts (8 scripts on 8 processes).
> I have read up on a couple of tutorials and looked at the documentation
> for different parts of slurm.
> But what ever flags I use for both sbatch and srun I do not seem to
> be able to accomplish what I want.
> All nodes have 6 cores with 2 threads.
>
> The closest I have come is with this small sbatch:
>
> #! /bin/bash
> #SBATCH --job-name=TestParallel
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=1
> #SBATCH --ntasks=2
> #SBATCH --cpus-per-task=12
> #SBATCH --nodelist=node1,node2
> #SBATCH --output="%x-%4j-%N.out"
> #SBATCH --mail-user=my at mail
> #SBATCH --mail-type=ALL
>
> echo
> date +%Y-%m-%d"     "%H-%M-%S
>
> for i in {1..20}
>   do
>       srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1
> --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log
> /path/to/test_prog$i.sh &
> done
>
> date +%Y-%m-%d"     "%H-%M-%S
>
> wait
>
>
> sacct gives the following output:
> 505          TestParal+        all    marcus         24    RUNNING
>  node[1-2]        0:0
> 505.batch         batch                              12    RUNNING
>  node1            0:0
> 505.0           Testp-3                               1    RUNNING
>  node1            0:0
> 505.1           Testp-6                               1    RUNNING
>  node2            0:0
> 505.2           Testp-2                               1    RUNNING
>  node1            0:0
> 505.3          Testp-13                               1    RUNNING
>  node1            0:0
> 505.4           Testp-9                               1    RUNNING
>  node1            0:0
> 505.5          Testp-11                               1    RUNNING
>  node1            0:0
> 505.6          Testp-16                               1    RUNNING
>  node1            0:0
> 505.7          Testp-12                               1    RUNNING
>  node1            0:0
> 505.8          Testp-20                               1    RUNNING
>  node1            0:0
> 505.9           Testp-4                               1    RUNNING
>  node1            0:0
> 505.10         Testp-19                               1    RUNNING
>  node1            0:0
> 505.11         Testp-10                               1    RUNNING
>  node1            0:0
> 505.12          Testp-5                               1    RUNNING
>  node1            0:0
>
>
> Slurm only use one process on node2 and of cause I want all the last 8
> processes to run on node2.
>
> I have tried a number of other options usualy ending in running the same
> script multiple times
> and that is not what I want.
>
> I feel a bit stuck and can not get my head around this.
>
> I would really appreciate some help!!
>
> Many thanks in advance!!
>
> Best Regards
> Marcus
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211105/9df4839f/attachment.htm>


More information about the slurm-users mailing list