[slurm-users] Issue with x11

Christopher Samuel chris at csamuel.org
Thu May 16 16:01:08 UTC 2019


On 5/16/19 1:04 AM, Alan Orth wrote:

> but now we get a handful of nodes drained every day with reason "Kill 
> task failed". In ten years of using SLURM I've never had so many 
> problems as I'm having now. :\

We see "kill task failed" issues but as Marcus says that's not related 
to X11 support, when we see it it's usually because the kernel cannot 
evict dirty pages from cgroups quickly enough (or at all) for Slurm's 
liking.  You may want to tweak the default timeout for your 
UnkillableStepTimeout from the default of 60 seconds.

All the best,
Chris
-- 
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list