[slurm-users] Compute node process monitoring tools updated
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Mon Jan 18 14:07:05 UTC 2021
FYI: My Slurm tools for displaying batch job user process information have
been updated. Besides the user process list from "ps", a summary of the
number of processes and threads is now printed as well. We use this for
monitoring the sanity of user jobs. For example, we often see jobs that
run too many threads per process and overload the CPUs.
The tools are:
* psjob <jobid> for all user processes in a job
* psnode <nodelist> for all user processes on a node or list of nodes
Download the psjob and psnode tools from:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
More information about the slurm-users
mailing list