[slurm-users] Decreasing time limit of running jobs (notification)
Davide DelVento
davide.quantum at gmail.com
Mon Jul 10 16:06:27 UTC 2023
Actually rm -r does not give ANY warning, so in plain Linux "rm -r /" run
as root would destroy your system without notice. Your particular Linux
distro may have implemented safeguards with a shell alias such as `alias
rm='rm -i'` and that's a common thing, but not guaranteed to be there
On Thu, Jul 6, 2023 at 11:40 AM Amjad Syed <amjadcsu at gmail.com> wrote:
> Agreed the point of greater responsibility but even rm -r ( without
> f) gives a warning. In this case should slurm have that option (
> forced) especially if it can immediately kill a running job?
>
>
>
>
>
> On Thu, 6 Jul 2023, 18:16 Jason Simms, <jsimms1 at swarthmore.edu> wrote:
>
>> An unfortunate example of the “with great power comes great
>> responsibility” maxim. Linux will gleefully let you rm -fr your entire
>> system, drop production databases, etc., provided you have the right
>> privileges. Ask me how I know…
>>
>> Still, I get the point. Would it be possible to somehow ask for
>> confirmation if you are setting a max time that is less than the current
>> walltime? Perhaps. Could you script that yourself? Yes, I’m certain of it.
>> Those kind of built-in safeguards aren’t super common, however.
>>
>> Jason
>>
>> On Thu, Jul 6, 2023 at 12:55 PM Amjad Syed <amjadcsu at gmail.com> wrote:
>>
>>> Yes, the initial End Time was 7-00:00:00 but it allowed the typo
>>> (16:00:00) which caused the jobs to be killed without warning
>>>
>>> Amjad
>>>
>>> On Thu, Jul 6, 2023 at 5:27 PM Bernstein, Noam CIV USN NRL (6393)
>>> Washington DC (USA) <noam.bernstein at nrl.navy.mil> wrote:
>>>
>>>> Is the issue that the error in the time made it shorter than the time
>>>> the job had already run, so it killed it immediately?
>>>>
>>>> On Jul 6, 2023, at 12:04 PM, Jason Simms <jsimms1 at swarthmore.edu>
>>>> wrote:
>>>>
>>>> No, not a bug, I would say. When the time limit is reached, that's it,
>>>> job dies. I wouldn't be aware of a way to manage that. Once the time limit
>>>> is reached, it wouldn't be a hard limit if you then had to notify the user
>>>> and then... what? How long would you give them to extend the time? Wouldn't
>>>> be much of a limit if a job can be extended, plus that would throw off the
>>>> scheduler/estimator. I'd chalk it up to an unfortunate typo.
>>>>
>>>> Jason
>>>>
>>>> On Thu, Jul 6, 2023 at 11:54 AM Amjad Syed <amjadcsu at gmail.com> wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> We were trying to increase the time limit of a slurm running job
>>>>>
>>>>> scontrol update job=<jobid> TimeLimit=16-00:00:00
>>>>>
>>>>> But we accidentally got it to 16 hours
>>>>>
>>>>> scontrol update job=<jobid> TimeLimit=16:00:00
>>>>>
>>>>> This actually timeout and killed the running job and did not give any
>>>>> notification
>>>>>
>>>>> Is this a bug, should not the user be warned that this job will be
>>>>> killled ?
>>>>>
>>>>> Amjad
>>>>>
>>>>>
>>>>
>>>> --
>>>> *Jason L. Simms, Ph.D., M.P.H.*
>>>> Manager of Research Computing
>>>> Swarthmore College
>>>> Information Technology Services
>>>> (610) 328-8102
>>>> Schedule a meeting: https://calendly.com/jlsimms
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *U.S. NAVAL *
>>>>
>>>>
>>>> *RESEARCH *
>>>>
>>>> LABORATORY
>>>> Noam Bernstein, Ph.D.
>>>> Center for Materials Physics and Technology
>>>> U.S. Naval Research Laboratory
>>>> T +1 202 404 8628 F +1 202 404 7546
>>>> https://www.nrl.navy.mil
>>>>
>>>>
>>>> --
>> *Jason L. Simms, Ph.D., M.P.H.*
>> Manager of Research Computing
>> Swarthmore College
>> Information Technology Services
>> (610) 328-8102
>> Schedule a meeting: https://calendly.com/jlsimms
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230710/4e30ce00/attachment.htm>
More information about the slurm-users
mailing list