[slurm-users] MPI job termination

Mahmood Naderan mahmood.nt at gmail.com
Sun Apr 7 17:15:55 UTC 2019


Hi,
A multinode MPI job terminated with the following messages in the log file

=------------------------------------------------------------------------------=
   JOB DONE.
=------------------------------------------------------------------------------=
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
STOP 2
STOP 2
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9801,1],8]
  Exit code:    2
------------------------------------


Although it said job is done, I would like to know if there is any abnormal
termination for that.
Moreover, I can not figure out if there is a problem with the input files
or not. For example, maybe the calculations diverged. But this error can
not clarify that.
Any idea?

Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190407/c9d54fc4/attachment.html>


More information about the slurm-users mailing list