[slurm-users] Multinode MPI job

Mahmood Naderan mahmood.nt at gmail.com
Thu Mar 28 06:53:22 UTC 2019


$ srun --version
slurm 18.08.4

I have noticed that after 60 seconds, the job is aborted according to the
output log file.

srun: First task exited 60s ago
srun: step:759.0 pack_group:0 tasks 0-1: exited
srun: step:760.0 pack_group:1 tasks 0-1: running
srun: step:760.0 pack_group:1 tasks 2-3: exited abnormally
srun: Terminating job step 759.0
srun: Terminating job step 760.0
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 760.0 ON rocks7 CANCELLED AT
2019-03-28T11:21:32 ***
srun: error: rocks7: tasks 0-1: Killed



Regards,
Mahmood




On Thu, Mar 28, 2019 at 11:09 AM Chris Samuel <chris at csamuel.org> wrote:

> On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote:
>
> > Still only one node is running the processes
>
> What does "srun --version" say?
>
> Do you get any errors in your output file from the second pack job?
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190328/02354fca/attachment.html>


More information about the slurm-users mailing list