[slurm-users] Slurm - UnkillableStepProgram
Yap, Mike
M.Yap at massey.ac.nz
Tue Mar 23 21:19:35 UTC 2021
Hi Luke
Thanks for the head up
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Luke Yeager
Sent: Wednesday, 24 March 2021 4:58 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Slurm - UnkillableStepProgram
While you're looking at this, make sure you don't set UnkillableStepTimeout to a value larger than 126 seconds:
https://bugs.schedmd.com/show_bug.cgi?id=11103
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> On Behalf Of Yap, Mike
Sent: Monday, March 22, 2021 7:13 PM
To: slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>
Subject: [slurm-users] Slurm - UnkillableStepProgram
External email: Use caution opening links or attachments
Hi All
Have been reading on the archive hoping to implement unkillablesteptimeout and unkillablesteprogram to the slurm
But I'm kind of confuse with it application
1. I presume UnkillableStepTimeout is set in slurm.conf. and it act as a timer to trigger UnkillableStepProgram
2. UnkillableStepProgram can be use to send email or reboot compute node - question is how do we configure it ?
scontrol show config | grep -i kill
KillOnBadExit = 1
KillWait = 30 sec
UnkillableStepProgram = (null)
UnkillableStepTimeout = 300 sec
Please advise
Thanks
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210323/e993a251/attachment-0001.htm>
More information about the slurm-users
mailing list