[slurm-users] Decreasing time limit of running jobs (notification)
Jason Simms
jsimms1 at swarthmore.edu
Thu Jul 6 17:14:01 UTC 2023
An unfortunate example of the “with great power comes great responsibility”
maxim. Linux will gleefully let you rm -fr your entire system, drop
production databases, etc., provided you have the right privileges. Ask me
how I know…
Still, I get the point. Would it be possible to somehow ask for
confirmation if you are setting a max time that is less than the current
walltime? Perhaps. Could you script that yourself? Yes, I’m certain of it.
Those kind of built-in safeguards aren’t super common, however.
Jason
On Thu, Jul 6, 2023 at 12:55 PM Amjad Syed <amjadcsu at gmail.com> wrote:
> Yes, the initial End Time was 7-00:00:00 but it allowed the typo
> (16:00:00) which caused the jobs to be killed without warning
>
> Amjad
>
> On Thu, Jul 6, 2023 at 5:27 PM Bernstein, Noam CIV USN NRL (6393)
> Washington DC (USA) <noam.bernstein at nrl.navy.mil> wrote:
>
>> Is the issue that the error in the time made it shorter than the time the
>> job had already run, so it killed it immediately?
>>
>> On Jul 6, 2023, at 12:04 PM, Jason Simms <jsimms1 at swarthmore.edu> wrote:
>>
>> No, not a bug, I would say. When the time limit is reached, that's it,
>> job dies. I wouldn't be aware of a way to manage that. Once the time limit
>> is reached, it wouldn't be a hard limit if you then had to notify the user
>> and then... what? How long would you give them to extend the time? Wouldn't
>> be much of a limit if a job can be extended, plus that would throw off the
>> scheduler/estimator. I'd chalk it up to an unfortunate typo.
>>
>> Jason
>>
>> On Thu, Jul 6, 2023 at 11:54 AM Amjad Syed <amjadcsu at gmail.com> wrote:
>>
>>> Hello
>>>
>>> We were trying to increase the time limit of a slurm running job
>>>
>>> scontrol update job=<jobid> TimeLimit=16-00:00:00
>>>
>>> But we accidentally got it to 16 hours
>>>
>>> scontrol update job=<jobid> TimeLimit=16:00:00
>>>
>>> This actually timeout and killed the running job and did not give any
>>> notification
>>>
>>> Is this a bug, should not the user be warned that this job will be
>>> killled ?
>>>
>>> Amjad
>>>
>>>
>>
>> --
>> *Jason L. Simms, Ph.D., M.P.H.*
>> Manager of Research Computing
>> Swarthmore College
>> Information Technology Services
>> (610) 328-8102
>> Schedule a meeting: https://calendly.com/jlsimms
>>
>>
>>
>>
>>
>>
>>
>>
>> *U.S. NAVAL *
>>
>>
>> *RESEARCH *
>>
>> LABORATORY
>> Noam Bernstein, Ph.D.
>> Center for Materials Physics and Technology
>> U.S. Naval Research Laboratory
>> T +1 202 404 8628 F +1 202 404 7546
>> https://www.nrl.navy.mil
>>
>>
>> --
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230706/b4b29cdc/attachment-0001.htm>
More information about the slurm-users
mailing list