[slurm-users] restart user login ONLY

Brian Andrus toomuchit at gmail.com
Mon Jul 19 17:24:34 UTC 2021


Not really a slurm question, but here's my 2 cents:

FWIW, if they are true zombies (PPID 1 and kill -9 will not work) you 
can only get rid of them with a reboot.

If they aren't eating much in the line of resources, you will want to 
just ignore them until your next maintenance and then reboot.

This is one of the reasons I do not architect login nodes to allow 
access to applications or much of anything. Minimal everything.

If your login node gets quite a bit of traffic, you should look at 
setting up a load-balanced HA configuration for them. Users should not 
have much of anything going on with a login node. Just submit your job 
and do your work on the node. Even if it is an interactive job. Keeps 
your dev/test environment the same as the runtime environment.

Brian Andrus

On 7/19/2021 7:09 AM, Durai Arasan wrote:
> Hello,
>
> One of our slurm user's account is hung with uninterruptible 
> processes. These processes cannot be killed. Hence a restart is 
> required. Is it possible to restart the user's login environment 
> alone? I would like to not restart the entire login node.
>
> Thanks!
> Durai
> Max Planck Institute Tübingen



More information about the slurm-users mailing list