[slurm-users] How to trap a SIGINT signal in a child process of a batch ?

Jean-mathieu CHANTREIN jean-mathieu.chantrein at univ-angers.fr
Tue Apr 21 09:05:43 UTC 2020


----- Mail original -----
> De: "b h mevik" <b.h.mevik at usit.uio.no>
> À: "slurm-users" <slurm-users at schedmd.com>
> Envoyé: Mardi 21 Avril 2020 10:29:32
> Objet: Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

> Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> writes:
> 
>> test.sh:
>>
>> #!/bin/bash
>>
>> function sig_handler()
>> {
>> echo "Executable interrupted"
>> exit 2
>> }
>>
>> trap 'sig_handler' SIGINT
>>
>> echo "BEGIN"
>> sleep 200
>> echo "END"
> 
> Note that bash does not interrupt any running command (except "wait")
> when it receives a trapped signal, so the "sleep 200" will not be
> interrupted.  The "wait" command is special; it will be interrupted.
> From man bash:
> 
>       If  bash is waiting for a command to complete and receives a signal for which a
>       trap has been set, the trap will not be executed until the  command  completes.
>       When  bash  is  waiting  for  an asynchronous command via the wait builtin, the
>       reception of a signal for which a trap has been set will cause the wait builtin
>       to  return  immediately with an exit status greater than 128, immediately after
>       which the trap is executed.
> 
> So try using
> 
> sleep 200 &
> wait
> 
> instead.
> 
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo

Yes, you're right. Thank you.
But that is not enough, it is also necessary to use srun in test.slurm, because the signals are sent to the child processes only if they are also children in the JOB sense.
In the end, a valid minimum example corresponds to this:

test.slurm:
---------------------------------------------
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --time=00:03:00
#SBATCH --signal=SIGINT at 30

sig_handler()
{
         echo "BATCH interrupted"
         exit 2
}

trap 'sig_handler' SIGINT

srun ~/test.sh &

wait
---------------------------------------------

test.sh:
---------------------------------------------
#!/bin/bash

function sig_handler()
{
         echo "Executable interrupted"
         exit 2
}

trap 'sig_handler' SIGINT

echo "BEGIN"
sleep 200 &
wait
echo "END"
---------------------------------------------

Best regards,

Jean-Mathieu



More information about the slurm-users mailing list