[slurm-users] MPI jobs via mirun vs. srun through PMIx.

Juergen Salk juergen.salk at uni-ulm.de
Tue Sep 17 20:21:56 UTC 2019


* Philip Kovacs <pkdevel at yahoo.com> [190917 07:43]:

> >> I suspect the question, which I also have, is more like:
> >> 
> >>  "What difference does it make whether I use 'srun' or 'mpirun' within
> >>    a batch file started with 'sbatch'."
> 
> One big thing would be that using srun gives you resource tracking
> and accountingof the individual job or step that you would not
> otherwise get with mpirun.

Hello,

thank you very much for all the feedback. From your responses I
gathered that there is a broad consensus to use srun because of its
tighter integration with Slurm. 

I have now also realized that srun and mpirun do indeed make a big
difference in terms of accounting. 

For testing I've run two identical jobs today, both of them 
with 

#SBATCH --nodes=4
#SBATCH --tasks-per-node=16
#SBATCH --mem=8gb

but the first batch script launched the application with srun, i.e.

module load mpi/impi
export I_MPI_PMI_LIBRARY=/opt/pmix/2.2.3/lib/libpmi.so
srun ./stress

whereas the second job script used mpirun to spawn the processes, i.e.

module load mpi/impi
mpirun ./stress

Every process (rank) ran for 10 minutes, resulting in an overall 
CPU time of 10h and 40 minutes (= 4 nodes * 16 cores * 10 minutes). 

I have then compared all the accounting records of both jobs (as
reported by `sacct´ command) and there are some noticeable differences
in various fields for job step id 0.

For the mpirun test case some of the fields appear really odd, 
e.g. AllocCPUS=4, NCPUS=4 and NTasks=4. I would have expected 
AllocCPUS=64, NCPUS=64 and NTasks=64 and that is exactly what `sacct´
reports for the srun test case. Also CPUTime=00:40:04 (reported for 
mpirun case) is weird compared to CPUTime=10:41:04 (reported for the
srun case). Interestingly TotalCPU and UserCPU fields do match for
both test cases.

There are quite some more differences that I still need to 
understand. But in general, the accounting records for the srun 
test case seem to be more in line with my expectations.
 
Thanks again.

Best regards
Jürgen

-- 
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471



More information about the slurm-users mailing list