[slurm-users] Source of SIGTERM

Marcus Wagner wagner at itc.rwth-aachen.de
Fri Mar 8 12:38:55 UTC 2019


Hi Doug,

you could try to use auditd to catch the source.
When we used LSF in earlier times, we had an issue with one of our 
prolog scripts, which killed jobs, when a job of the same user was 
already on the node. auditd helped at that point to identify our own 
nodecleaner script ;)

Best
Marcus

On 3/7/19 4:55 PM, Doug Meyer wrote:
> Looking  for advice on identifying source of a job cancellation.  
> Preemption is not configured on the partition. Sometimes receive a 
> message " Job nnnnnnn on nodexxx CANCELLED at date/time **** Signal 
> SIGTERM caugjt..."  Do not see anyrhing in node logs or slurmctl logs 
> suggesting the source of the SIGTERM.
>
> Thank you,
> Doug Meyer
>

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list