[slurm-users] Decreasing time limit of running jobs (notification)

Amjad Syed amjadcsu at gmail.com
Thu Jul 6 17:38:05 UTC 2023


Agreed the point  of greater  responsibility  but  even rm -r  ( without
f) gives  a warning.  In this case should slurm have that  option (
forced)   especially  if  it can immediately  kill a running  job?





On Thu, 6 Jul 2023, 18:16 Jason Simms, <jsimms1 at swarthmore.edu> wrote:

> An unfortunate example of the “with great power comes great
> responsibility” maxim. Linux will gleefully let you rm -fr your entire
> system, drop production databases, etc., provided you have the right
> privileges. Ask me how I know…
>
> Still, I get the point. Would it be possible to somehow ask for
> confirmation if you are setting a max time that is less than the current
> walltime? Perhaps. Could you script that yourself? Yes, I’m certain of it.
> Those kind of built-in safeguards aren’t super common, however.
>
> Jason
>
> On Thu, Jul 6, 2023 at 12:55 PM Amjad Syed <amjadcsu at gmail.com> wrote:
>
>> Yes, the initial End Time was 7-00:00:00 but it allowed the typo
>> (16:00:00) which caused the jobs to be killed without warning
>>
>> Amjad
>>
>> On Thu, Jul 6, 2023 at 5:27 PM Bernstein, Noam CIV USN NRL (6393)
>> Washington DC (USA) <noam.bernstein at nrl.navy.mil> wrote:
>>
>>> Is the issue that the error in the time made it shorter than the time
>>> the job had already run, so it killed it immediately?
>>>
>>> On Jul 6, 2023, at 12:04 PM, Jason Simms <jsimms1 at swarthmore.edu> wrote:
>>>
>>> No, not a bug, I would say. When the time limit is reached, that's it,
>>> job dies. I wouldn't be aware of a way to manage that. Once the time limit
>>> is reached, it wouldn't be a hard limit if you then had to notify the user
>>> and then... what? How long would you give them to extend the time? Wouldn't
>>> be much of a limit if a job can be extended, plus that would throw off the
>>> scheduler/estimator. I'd chalk it up to an unfortunate typo.
>>>
>>> Jason
>>>
>>> On Thu, Jul 6, 2023 at 11:54 AM Amjad Syed <amjadcsu at gmail.com> wrote:
>>>
>>>> Hello
>>>>
>>>> We were trying to increase the time limit of a slurm running job
>>>>
>>>> scontrol update job=<jobid> TimeLimit=16-00:00:00
>>>>
>>>> But we accidentally got it to 16 hours
>>>>
>>>> scontrol update job=<jobid> TimeLimit=16:00:00
>>>>
>>>> This actually timeout and killed the running job and did not give any
>>>> notification
>>>>
>>>> Is this a bug, should not the user be warned that this job will be
>>>> killled ?
>>>>
>>>> Amjad
>>>>
>>>>
>>>
>>> --
>>> *Jason L. Simms, Ph.D., M.P.H.*
>>> Manager of Research Computing
>>> Swarthmore College
>>> Information Technology Services
>>> (610) 328-8102
>>> Schedule a meeting: https://calendly.com/jlsimms
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *U.S. NAVAL *
>>>
>>>
>>> *RESEARCH *
>>>
>>> LABORATORY
>>> Noam Bernstein, Ph.D.
>>> Center for Materials Physics and Technology
>>> U.S. Naval Research Laboratory
>>> T +1 202 404 8628 F +1 202 404 7546
>>> https://www.nrl.navy.mil
>>>
>>>
>>> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research Computing
> Swarthmore College
> Information Technology Services
> (610) 328-8102
> Schedule a meeting: https://calendly.com/jlsimms
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230706/9f0fa0a8/attachment.htm>


More information about the slurm-users mailing list