[slurm-users] How to trap a SIGINT signal in a child process of a batch ?

Jeffrey T Frey frey at udel.edu
Tue Apr 21 13:23:09 UTC 2020


You could also choose to propagate the signal to the child process of test.slurm yourself:


#!/bin/bash
#SBATCH --job-name=test
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --time=00:03:00
#SBATCH --signal=B:SIGINT at 30

# This example works, but I need it to work without "B:" in --signal options, so I want test.sh receives the SIGINT signal and not test.slurm

sig_handler()
{
         echo "BATCH interrupted"
         if [ -n "$child_pid" ]; then
             kill -INT $child_pid
         fi
}

trap 'sig_handler' SIGINT

/home/user/test.sh &
child_pid=$!
wait $child_pid
exit $?


and


#!/bin/bash

function sig_handler()
{
         echo "Executable interrupted"
         exit 2
}

trap 'sig_handler' SIGINT

echo "BEGIN"
sleep 200 &
wait
echo "END"


Having your signal handler in test.slurm "exit 2" signals the end of the job, so the child processes will be terminated whether they've hit their own signal handler yet or not.  Signaling the child then returning control in test.slurm to wait and reap the child's exit code and "exit $?" actually gives the child time to do cleanup and influence the final exit code of the job.




> On Apr 21, 2020, at 06:13 , Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> wrote:
> 
> Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> writes:
> 
>> But that is not enough, it is also necessary to use srun in
>> test.slurm, because the signals are sent to the child processes only
>> if they are also children in the JOB sense.
> 
> Good to know!
> 
> -- 
> Cheers,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200421/8f93c8ce/attachment-0001.htm>


More information about the slurm-users mailing list