[slurm-users] Decreasing time limit of running jobs (notification)
jsimms1 at swarthmore.edu
Thu Jul 6 17:14:01 UTC 2023
An unfortunate example of the “with great power comes great responsibility”
maxim. Linux will gleefully let you rm -fr your entire system, drop
production databases, etc., provided you have the right privileges. Ask me
how I know…
Still, I get the point. Would it be possible to somehow ask for
confirmation if you are setting a max time that is less than the current
walltime? Perhaps. Could you script that yourself? Yes, I’m certain of it.
Those kind of built-in safeguards aren’t super common, however.
On Thu, Jul 6, 2023 at 12:55 PM Amjad Syed <amjadcsu at gmail.com> wrote:
> Yes, the initial End Time was 7-00:00:00 but it allowed the typo
> (16:00:00) which caused the jobs to be killed without warning
> On Thu, Jul 6, 2023 at 5:27 PM Bernstein, Noam CIV USN NRL (6393)
> Washington DC (USA) <noam.bernstein at nrl.navy.mil> wrote:
>> Is the issue that the error in the time made it shorter than the time the
>> job had already run, so it killed it immediately?
>> On Jul 6, 2023, at 12:04 PM, Jason Simms <jsimms1 at swarthmore.edu> wrote:
>> No, not a bug, I would say. When the time limit is reached, that's it,
>> job dies. I wouldn't be aware of a way to manage that. Once the time limit
>> is reached, it wouldn't be a hard limit if you then had to notify the user
>> and then... what? How long would you give them to extend the time? Wouldn't
>> be much of a limit if a job can be extended, plus that would throw off the
>> scheduler/estimator. I'd chalk it up to an unfortunate typo.
>> On Thu, Jul 6, 2023 at 11:54 AM Amjad Syed <amjadcsu at gmail.com> wrote:
>>> We were trying to increase the time limit of a slurm running job
>>> scontrol update job=<jobid> TimeLimit=16-00:00:00
>>> But we accidentally got it to 16 hours
>>> scontrol update job=<jobid> TimeLimit=16:00:00
>>> This actually timeout and killed the running job and did not give any
>>> Is this a bug, should not the user be warned that this job will be
>>> killled ?
>> *Jason L. Simms, Ph.D., M.P.H.*
>> Manager of Research Computing
>> Swarthmore College
>> Information Technology Services
>> (610) 328-8102
>> Schedule a meeting: https://calendly.com/jlsimms
>> *U.S. NAVAL *
>> *RESEARCH *
>> Noam Bernstein, Ph.D.
>> Center for Materials Physics and Technology
>> U.S. Naval Research Laboratory
>> T +1 202 404 8628 F +1 202 404 7546
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Information Technology Services
Schedule a meeting: https://calendly.com/jlsimms
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users