[slurm-users] Source of SIGTERM
Marcus Wagner
wagner at itc.rwth-aachen.de
Fri Mar 8 12:38:55 UTC 2019
Hi Doug,
you could try to use auditd to catch the source.
When we used LSF in earlier times, we had an issue with one of our
prolog scripts, which killed jobs, when a job of the same user was
already on the node. auditd helped at that point to identify our own
nodecleaner script ;)
Best
Marcus
On 3/7/19 4:55 PM, Doug Meyer wrote:
> Looking for advice on identifying source of a job cancellation.
> Preemption is not configured on the partition. Sometimes receive a
> message " Job nnnnnnn on nodexxx CANCELLED at date/time **** Signal
> SIGTERM caugjt..." Do not see anyrhing in node logs or slurmctl logs
> suggesting the source of the SIGTERM.
>
> Thank you,
> Doug Meyer
>
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de
More information about the slurm-users
mailing list