[slurm-users] RES: multiple srun commands in the same SLURM script

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Wed Nov 1 07:55:07 UTC 2023


Paulo Jose Braga Estrela <paulo.estrela at petrobras.com.br> writes:

> Hi,
>
> I think that you have a syntax error in your bash script. The "&"
> means that you want to send a process to background not that you want
> to run many commands in parallel. To run commands in a serial fashion
> you should use cmd && cmd2, then the cmd2 will only be executed if the
> command 1 return 0 as exit code.
>
> To run commands in parallel with srun you should set the number of
> tasks to 4, so srun will spawn 4 tasks of the same command. Take a
> look at the examples section in srun
> docs. (https://slurm.schedmd.com/srun.html)

Well, if you look at Example 7 in that section:

Example 7:
    This example shows a script in which Slurm is used to provide resource management for a job by executing the various job steps as processors become available for their dedicated use. 

    $ cat my.script
    #!/bin/bash
    srun -n4 prog1 &
    srun -n3 prog2 &
    srun -n1 prog3 &
    srun -n1 prog4 &
    wait

which is what OP tries to do.  It is mainly for running *different*
programs in parallel inside a job.  If one wants to run *the same*
program in parallel, then a single srun is indeed the recommended way.

I think the main problem is that the original job script only asks for a
single CPU, so the sruns will only run one at a time.  Try adding
--ntasks-per-node=4 or similar.

Note that exactly how to run different programs in parallel with srun
has changed quite a bit in the recent versions, and the example above is
for the latest version, so check the srun man page for your version.
(And unfortunately, the documentation in the srun man page has not
always been correct, so you might need to experiment.  For instance, I
believe Example 7 above is missing `--exact` or `SLURM_EXACT`. :) )

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231101/994f79af/attachment.sig>


More information about the slurm-users mailing list