[slurm-users] Issue with x11
Christopher Samuel
chris at csamuel.org
Thu May 16 16:01:08 UTC 2019
On 5/16/19 1:04 AM, Alan Orth wrote:
> but now we get a handful of nodes drained every day with reason "Kill
> task failed". In ten years of using SLURM I've never had so many
> problems as I'm having now. :\
We see "kill task failed" issues but as Marcus says that's not related
to X11 support, when we see it it's usually because the kernel cannot
evict dirty pages from cgroups quickly enough (or at all) for Slurm's
liking. You may want to tweak the default timeout for your
UnkillableStepTimeout from the default of 60 seconds.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list