[slurm-users] MPI jobs via mirun vs. srun through PMIx.

Juergen Salk juergen.salk at uni-ulm.de
Tue Sep 17 20:21:56 UTC 2019

* Philip Kovacs <pkdevel at yahoo.com> [190917 07:43]:

> >> I suspect the question, which I also have, is more like:
> >> 
> >>  "What difference does it make whether I use 'srun' or 'mpirun' within
> >>    a batch file started with 'sbatch'."
> One big thing would be that using srun gives you resource tracking
> and accountingof the individual job or step that you would not
> otherwise get with mpirun.


thank you very much for all the feedback. From your responses I
gathered that there is a broad consensus to use srun because of its
tighter integration with Slurm. 

I have now also realized that srun and mpirun do indeed make a big
difference in terms of accounting. 

For testing I've run two identical jobs today, both of them 

#SBATCH --nodes=4
#SBATCH --tasks-per-node=16
#SBATCH --mem=8gb

but the first batch script launched the application with srun, i.e.

module load mpi/impi
export I_MPI_PMI_LIBRARY=/opt/pmix/2.2.3/lib/libpmi.so
srun ./stress

whereas the second job script used mpirun to spawn the processes, i.e.

module load mpi/impi
mpirun ./stress

Every process (rank) ran for 10 minutes, resulting in an overall 
CPU time of 10h and 40 minutes (= 4 nodes * 16 cores * 10 minutes). 

I have then compared all the accounting records of both jobs (as
reported by `sacct´ command) and there are some noticeable differences
in various fields for job step id 0.

For the mpirun test case some of the fields appear really odd, 
e.g. AllocCPUS=4, NCPUS=4 and NTasks=4. I would have expected 
AllocCPUS=64, NCPUS=64 and NTasks=64 and that is exactly what `sacct´
reports for the srun test case. Also CPUTime=00:40:04 (reported for 
mpirun case) is weird compared to CPUTime=10:41:04 (reported for the
srun case). Interestingly TotalCPU and UserCPU fields do match for
both test cases.

There are quite some more differences that I still need to 
understand. But in general, the accounting records for the srun 
test case seem to be more in line with my expectations.
Thanks again.

Best regards

Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471

More information about the slurm-users mailing list