[slurm-users] Stopping new jobs but letting old ones end
Brian Andrus
toomuchit at gmail.com
Tue Feb 1 05:25:56 UTC 2022
One possibility:
Sounds like your concern is folks with interactive jobs from the login
node that are running under screen/tmux.
That being the case, you need running jobs to end and not allow new
users to start tmux sessions.
Definitely doing 'scontrol update state=down partition=xxxx' for each
partition. Also:
touch /etc/nologin
That will prevent new logins.
Send a message to all active folks
wall "system going down at XX:XX, please end your sessions"
Then wait for folks to drain off your login node and do your stuff.
When done, remove the /etc/nologin file and folks will be able to login
again.
Brian Andrus
On 1/31/2022 9:18 PM, Sid Young wrote:
>
>
>
> Sid Young
> W: https://off-grid-engineering.com
> W: (personal) https://sidyoung.com/
> W: (personal) https://z900collector.wordpress.com/
>
>
> On Tue, Feb 1, 2022 at 3:02 PM Christopher Samuel <chris at csamuel.org>
> wrote:
>
> On 1/31/22 4:41 pm, Sid Young wrote:
>
> > I need to replace a faulty DIMM chim in our login node so I need
> to stop
> > new jobs being kicked off while letting the old ones end.
> >
> > I thought I would just set all nodes to drain to stop new jobs from
> > being kicked off...
>
> That would basically be the way, but is there any reason why compute
> jobs shouldn't start whilst the login node is down?
>
>
> My concern was to keep the running jobs going and stop new jobs, so
> when the last running job ends,
> I could reboot the login node knowing that any terminal windows
> "screen"/"tmux" sessions would effectively
> have ended as the job(s) had now ended
>
> I'm not sure if there was an accepted procedure or best practice way
> to tackle shutting down the Login node for this use case.
>
> On the bright side I am down to two jobs left so any day now :)
>
> Sid
>
>
>
>
> All the best,
> Chris
> --
> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220131/5a9359a0/attachment.htm>
More information about the slurm-users
mailing list