[slurm-users] Stopping new jobs but letting old ones end

Brian Andrus toomuchit at gmail.com
Tue Feb 1 05:25:56 UTC 2022


One possibility:

Sounds like your concern is folks with interactive jobs from the login 
node that are running under screen/tmux.

That being the case, you need running jobs to end and not allow new 
users to start tmux sessions.

Definitely doing 'scontrol update state=down partition=xxxx' for each 
partition. Also:

touch /etc/nologin

That will prevent new logins.

Send a message to all active folks

wall "system going down at XX:XX, please end your sessions"

Then wait for folks to drain off your login node and do your stuff.

When done, remove the /etc/nologin file and folks will be able to login 
again.

Brian Andrus

On 1/31/2022 9:18 PM, Sid Young wrote:
>
>
>
> Sid Young
> W: https://off-grid-engineering.com
> W: (personal) https://sidyoung.com/
> W: (personal) https://z900collector.wordpress.com/
>
>
> On Tue, Feb 1, 2022 at 3:02 PM Christopher Samuel <chris at csamuel.org> 
> wrote:
>
>     On 1/31/22 4:41 pm, Sid Young wrote:
>
>     > I need to replace a faulty DIMM chim in our login node so I need
>     to stop
>     > new jobs being kicked off while letting the old ones end.
>     >
>     > I thought I would just set all nodes to drain to stop new jobs from
>     > being kicked off...
>
>     That would basically be the way, but is there any reason why compute
>     jobs shouldn't start whilst the login node is down?
>
>
> My concern was to keep the running jobs going and stop new jobs, so 
> when the last running job ends,
>  I could reboot the login node knowing that any terminal windows 
> "screen"/"tmux" sessions would effectively
> have ended as the job(s) had now ended
>
> I'm not sure if there was an accepted procedure or best practice way 
> to tackle shutting down the Login node for this use case.
>
> On the bright side I am down to two jobs left so any day now :)
>
> Sid
>
>
>
>
>     All the best,
>     Chris
>     -- 
>        Chris Samuel  : http://www.csamuel.org/ :  Berkeley, CA, USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220131/5a9359a0/attachment.htm>


More information about the slurm-users mailing list