[slurm-users] MPI jobs via mirun vs. srun through PMIx.
Juergen Salk
juergen.salk at uni-ulm.de
Tue Sep 17 20:21:56 UTC 2019
* Philip Kovacs <pkdevel at yahoo.com> [190917 07:43]:
> >> I suspect the question, which I also have, is more like:
> >>
> >> "What difference does it make whether I use 'srun' or 'mpirun' within
> >> a batch file started with 'sbatch'."
>
> One big thing would be that using srun gives you resource tracking
> and accountingof the individual job or step that you would not
> otherwise get with mpirun.
Hello,
thank you very much for all the feedback. From your responses I
gathered that there is a broad consensus to use srun because of its
tighter integration with Slurm.
I have now also realized that srun and mpirun do indeed make a big
difference in terms of accounting.
For testing I've run two identical jobs today, both of them
with
#SBATCH --nodes=4
#SBATCH --tasks-per-node=16
#SBATCH --mem=8gb
but the first batch script launched the application with srun, i.e.
module load mpi/impi
export I_MPI_PMI_LIBRARY=/opt/pmix/2.2.3/lib/libpmi.so
srun ./stress
whereas the second job script used mpirun to spawn the processes, i.e.
module load mpi/impi
mpirun ./stress
Every process (rank) ran for 10 minutes, resulting in an overall
CPU time of 10h and 40 minutes (= 4 nodes * 16 cores * 10 minutes).
I have then compared all the accounting records of both jobs (as
reported by `sacct´ command) and there are some noticeable differences
in various fields for job step id 0.
For the mpirun test case some of the fields appear really odd,
e.g. AllocCPUS=4, NCPUS=4 and NTasks=4. I would have expected
AllocCPUS=64, NCPUS=64 and NTasks=64 and that is exactly what `sacct´
reports for the srun test case. Also CPUTime=00:40:04 (reported for
mpirun case) is weird compared to CPUTime=10:41:04 (reported for the
srun case). Interestingly TotalCPU and UserCPU fields do match for
both test cases.
There are quite some more differences that I still need to
understand. But in general, the accounting records for the srun
test case seem to be more in line with my expectations.
Thanks again.
Best regards
Jürgen
--
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471
More information about the slurm-users
mailing list