[slurm-users] Multinode MPI job
mahmood.nt at gmail.com
Thu Mar 28 06:53:22 UTC 2019
$ srun --version
I have noticed that after 60 seconds, the job is aborted according to the
output log file.
srun: First task exited 60s ago
srun: step:759.0 pack_group:0 tasks 0-1: exited
srun: step:760.0 pack_group:1 tasks 0-1: running
srun: step:760.0 pack_group:1 tasks 2-3: exited abnormally
srun: Terminating job step 759.0
srun: Terminating job step 760.0
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 760.0 ON rocks7 CANCELLED AT
srun: error: rocks7: tasks 0-1: Killed
On Thu, Mar 28, 2019 at 11:09 AM Chris Samuel <chris at csamuel.org> wrote:
> On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote:
> > Still only one node is running the processes
> What does "srun --version" say?
> Do you get any errors in your output file from the second pack job?
> All the best,
> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users