[slurm-users] restart user login ONLY
Brian Andrus
toomuchit at gmail.com
Mon Jul 19 17:24:34 UTC 2021
Not really a slurm question, but here's my 2 cents:
FWIW, if they are true zombies (PPID 1 and kill -9 will not work) you
can only get rid of them with a reboot.
If they aren't eating much in the line of resources, you will want to
just ignore them until your next maintenance and then reboot.
This is one of the reasons I do not architect login nodes to allow
access to applications or much of anything. Minimal everything.
If your login node gets quite a bit of traffic, you should look at
setting up a load-balanced HA configuration for them. Users should not
have much of anything going on with a login node. Just submit your job
and do your work on the node. Even if it is an interactive job. Keeps
your dev/test environment the same as the runtime environment.
Brian Andrus
On 7/19/2021 7:09 AM, Durai Arasan wrote:
> Hello,
>
> One of our slurm user's account is hung with uninterruptible
> processes. These processes cannot be killed. Hence a restart is
> required. Is it possible to restart the user's login environment
> alone? I would like to not restart the entire login node.
>
> Thanks!
> Durai
> Max Planck Institute Tübingen
More information about the slurm-users
mailing list