[slurm-users] additional jobs killed by scancel.

Christopher Samuel chris at csamuel.org
Thu May 14 00:29:45 UTC 2020


On 5/11/20 9:52 am, Alastair Neil wrote:

> [2020-05-10T00:26:05.202] [533900.batch] sending 
> REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 9

This caught my eye, Googling for it found a single instance, from 2019 
on the list again about jobs on a node mysteriously dying.

The resolution was (courtesy of Uwe Seher):

# The system is an opensuse leap 15 installation and slurm
# comes from the repository. By default a slurm.epilog.clean
# skript is installed which kills everything that belongs to
$ the user when a job is finished including other jobs,
# ssh-sessions and so on. I do not know if other distributions
# do the same or if the script is broken, but removing it
# solved the problem.

Hope that helps!

All the best,
Chris
-- 
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list