[slurm-users] Dealing with wrong things that users do

Chris Samuel chris at csamuel.org
Thu Sep 20 04:48:53 MDT 2018


On Thursday, 20 September 2018 5:57:56 PM AEST Mahmood Naderan wrote:

> It seems that when their fluent job crashes for some reasons, or they
> decide to  close the fluent window without terminating the job or
> closing the terminal suddenly or ... the fluent processes remain in
> the node while the job is not listed in the output of squeue command.

If you use cgroups to contain jobs along with pam_slurm_adopt to put any SSH 
sessions into the jobs "extern" cgroup then Slurm should be able to track and 
clean up pretty much anything your users can throw at it.

https://slurm.schedmd.com/cgroups.html

https://slurm.schedmd.com/pam_slurm_adopt.html

Best of luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






More information about the slurm-users mailing list