[slurm-users] Reproducible irreproducible problem (timeout?)

Gerhard Strangar g.s at arcor.de
Wed Dec 20 18:56:49 UTC 2023


Laurence Marks wrote:

> After some (irreproducible) time, often one of the three slow tasks hangs.
> A symptom is that if I try and ssh into the main node of the subtask (which
> is running 128 mpi on the 4 nodes) I get "Authentication failed".

How about asking an admin to check why it hangs?



More information about the slurm-users mailing list