[slurm-users] MPI job termination
Mahmood Naderan
mahmood.nt at gmail.com
Sun Apr 7 17:15:55 UTC 2019
Hi,
A multinode MPI job terminated with the following messages in the log file
=------------------------------------------------------------------------------=
JOB DONE.
=------------------------------------------------------------------------------=
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
STOP 2
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
STOP 2
STOP 2
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[9801,1],8]
Exit code: 2
------------------------------------
Although it said job is done, I would like to know if there is any abnormal
termination for that.
Moreover, I can not figure out if there is a problem with the input files
or not. For example, maybe the calculations diverged. But this error can
not clarify that.
Any idea?
Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190407/c9d54fc4/attachment.html>
More information about the slurm-users
mailing list