[slurm-users] additional jobs killed by scancel.
Christopher Samuel
chris at csamuel.org
Thu May 14 00:29:45 UTC 2020
On 5/11/20 9:52 am, Alastair Neil wrote:
> [2020-05-10T00:26:05.202] [533900.batch] sending
> REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 9
This caught my eye, Googling for it found a single instance, from 2019
on the list again about jobs on a node mysteriously dying.
The resolution was (courtesy of Uwe Seher):
# The system is an opensuse leap 15 installation and slurm
# comes from the repository. By default a slurm.epilog.clean
# skript is installed which kills everything that belongs to
$ the user when a job is finished including other jobs,
# ssh-sessions and so on. I do not know if other distributions
# do the same or if the script is broken, but removing it
# solved the problem.
Hope that helps!
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list