[slurm-users] Slurm - UnkillableStepProgram
Yap, Mike
M.Yap at massey.ac.nz
Tue Mar 23 21:18:46 UTC 2021
Hi Chris
Thanks for the clarification
Mike
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Chris Samuel
Sent: Tuesday, 23 March 2021 5:30 PM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Slurm - UnkillableStepProgram
Hi Mike,
On 22/3/21 7:12 pm, Yap, Mike wrote:
> # I presume UnkillableStepTimeout is set in slurm.conf. and it act as
> a timer to trigger UnkillableStepProgram
That is correct.
> # UnkillableStepProgram can be use to send email or reboot compute
> node - question is how do we configure it ?
Also - or to automate collecting debug info (which is what we do) and then we manually intervene to reboot the node once we've determined there's no more useful info to collect.
It's just configured in your slurm.conf.
UnkillableStepProgram=/path/to/the/unkillable/step/script.sh
Of course this script has to be present on every compute node.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list