[slurm-users] Stopping new jobs but letting old ones end

Sid Young sid.young at gmail.com
Tue Feb 1 06:14:26 UTC 2022


Brian / Christopher, that looks like a good process, thanks guys, I will do
some testing and let you know.

if I mark a partition down and it has running jobs, what happens to those
jobs, do they keep running?


Sid Young
W: https://off-grid-engineering.com
W: (personal) https://sidyoung.com/
W: (personal) https://z900collector.wordpress.com/


On Tue, Feb 1, 2022 at 3:27 PM Brian Andrus <toomuchit at gmail.com> wrote:

> One possibility:
>
> Sounds like your concern is folks with interactive jobs from the login
> node that are running under screen/tmux.
>
> That being the case, you need running jobs to end and not allow new users
> to start tmux sessions.
>
> Definitely doing 'scontrol update state=down partition=xxxx' for each
> partition. Also:
>
> touch /etc/nologin
>
> That will prevent new logins.
>
> Send a message to all active folks
>
> wall "system going down at XX:XX, please end your sessions"
>
> Then wait for folks to drain off your login node and do your stuff.
>
> When done, remove the /etc/nologin file and folks will be able to login
> again.
>
> Brian Andrus
> On 1/31/2022 9:18 PM, Sid Young wrote:
>
>
>
>
> Sid Young
> W: https://off-grid-engineering.com
> W: (personal) https://sidyoung.com/
> W: (personal) https://z900collector.wordpress.com/
>
>
> On Tue, Feb 1, 2022 at 3:02 PM Christopher Samuel <chris at csamuel.org>
> wrote:
>
>> On 1/31/22 4:41 pm, Sid Young wrote:
>>
>> > I need to replace a faulty DIMM chim in our login node so I need to
>> stop
>> > new jobs being kicked off while letting the old ones end.
>> >
>> > I thought I would just set all nodes to drain to stop new jobs from
>> > being kicked off...
>>
>> That would basically be the way, but is there any reason why compute
>> jobs shouldn't start whilst the login node is down?
>>
>
> My concern was to keep the running jobs going and stop new jobs, so when
> the last running job ends,
>  I could reboot the login node knowing that any terminal windows
> "screen"/"tmux" sessions would effectively
> have ended as the job(s) had now ended
>
> I'm not sure if there was an accepted procedure or best practice way to
> tackle shutting down the Login node for this use case.
>
> On the bright side I am down to two jobs left so any day now :)
>
> Sid
>
>
>
>
>> All the best,
>> Chris
>> --
>>    Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220201/ee0a015a/attachment-0001.htm>


More information about the slurm-users mailing list