[slurm-users] How to trigger kernel stacktraces for stuck processes from unkillable steps
chris at csamuel.org
Wed Sep 18 19:52:34 UTC 2019
At the Slurm User Group I mentioned about how to tell the kernel to dump
information about stuck processes from your unkillable step script to
the kernel log buffer (seen via dmesg and hopefully syslog'd somewhere
useful for you).
echo w > /proc/sysrq-trigger
That's it.. ;-) You probably want to echo something useful to /dev/kmsg
beforehand to say what the job ID was that triggered it too.
The 'echo' will block until the kernel completes the writes, which if
you've got a lot stuck may be few seconds.
Hope this is useful!
All the best,
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users