[slurm-users] MPI jobs via mirun vs. srun through PMIx.
juergen.salk at uni-ulm.de
Tue Sep 17 20:21:56 UTC 2019
* Philip Kovacs <pkdevel at yahoo.com> [190917 07:43]:
> >> I suspect the question, which I also have, is more like:
> >> "What difference does it make whether I use 'srun' or 'mpirun' within
> >> a batch file started with 'sbatch'."
> One big thing would be that using srun gives you resource tracking
> and accountingof the individual job or step that you would not
> otherwise get with mpirun.
thank you very much for all the feedback. From your responses I
gathered that there is a broad consensus to use srun because of its
tighter integration with Slurm.
I have now also realized that srun and mpirun do indeed make a big
difference in terms of accounting.
For testing I've run two identical jobs today, both of them
but the first batch script launched the application with srun, i.e.
module load mpi/impi
whereas the second job script used mpirun to spawn the processes, i.e.
module load mpi/impi
Every process (rank) ran for 10 minutes, resulting in an overall
CPU time of 10h and 40 minutes (= 4 nodes * 16 cores * 10 minutes).
I have then compared all the accounting records of both jobs (as
reported by `sacct´ command) and there are some noticeable differences
in various fields for job step id 0.
For the mpirun test case some of the fields appear really odd,
e.g. AllocCPUS=4, NCPUS=4 and NTasks=4. I would have expected
AllocCPUS=64, NCPUS=64 and NTasks=64 and that is exactly what `sacct´
reports for the srun test case. Also CPUTime=00:40:04 (reported for
mpirun case) is weird compared to CPUTime=10:41:04 (reported for the
srun case). Interestingly TotalCPU and UserCPU fields do match for
both test cases.
There are quite some more differences that I still need to
understand. But in general, the accounting records for the srun
test case seem to be more in line with my expectations.
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471
More information about the slurm-users