[slurm-users] Dealing with wrong things that users do
Chris Samuel
chris at csamuel.org
Thu Sep 20 04:48:53 MDT 2018
On Thursday, 20 September 2018 5:57:56 PM AEST Mahmood Naderan wrote:
> It seems that when their fluent job crashes for some reasons, or they
> decide to close the fluent window without terminating the job or
> closing the terminal suddenly or ... the fluent processes remain in
> the node while the job is not listed in the output of squeue command.
If you use cgroups to contain jobs along with pam_slurm_adopt to put any SSH
sessions into the jobs "extern" cgroup then Slurm should be able to track and
clean up pretty much anything your users can throw at it.
https://slurm.schedmd.com/cgroups.html
https://slurm.schedmd.com/pam_slurm_adopt.html
Best of luck!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list