[slurm-users] How to trap a SIGINT signal in a child process of a batch ?
Jean-mathieu CHANTREIN
jean-mathieu.chantrein at univ-angers.fr
Tue Apr 21 09:05:43 UTC 2020
----- Mail original -----
> De: "b h mevik" <b.h.mevik at usit.uio.no>
> À: "slurm-users" <slurm-users at schedmd.com>
> Envoyé: Mardi 21 Avril 2020 10:29:32
> Objet: Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?
> Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> writes:
>
>> test.sh:
>>
>> #!/bin/bash
>>
>> function sig_handler()
>> {
>> echo "Executable interrupted"
>> exit 2
>> }
>>
>> trap 'sig_handler' SIGINT
>>
>> echo "BEGIN"
>> sleep 200
>> echo "END"
>
> Note that bash does not interrupt any running command (except "wait")
> when it receives a trapped signal, so the "sleep 200" will not be
> interrupted. The "wait" command is special; it will be interrupted.
> From man bash:
>
> If bash is waiting for a command to complete and receives a signal for which a
> trap has been set, the trap will not be executed until the command completes.
> When bash is waiting for an asynchronous command via the wait builtin, the
> reception of a signal for which a trap has been set will cause the wait builtin
> to return immediately with an exit status greater than 128, immediately after
> which the trap is executed.
>
> So try using
>
> sleep 200 &
> wait
>
> instead.
>
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo
Yes, you're right. Thank you.
But that is not enough, it is also necessary to use srun in test.slurm, because the signals are sent to the child processes only if they are also children in the JOB sense.
In the end, a valid minimum example corresponds to this:
test.slurm:
---------------------------------------------
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --time=00:03:00
#SBATCH --signal=SIGINT at 30
sig_handler()
{
echo "BATCH interrupted"
exit 2
}
trap 'sig_handler' SIGINT
srun ~/test.sh &
wait
---------------------------------------------
test.sh:
---------------------------------------------
#!/bin/bash
function sig_handler()
{
echo "Executable interrupted"
exit 2
}
trap 'sig_handler' SIGINT
echo "BEGIN"
sleep 200 &
wait
echo "END"
---------------------------------------------
Best regards,
Jean-Mathieu
More information about the slurm-users
mailing list