[slurm-users] Compute node process monitoring tools updated

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Jan 18 14:07:05 UTC 2021


FYI: My Slurm tools for displaying batch job user process information have 
been updated.  Besides the user process list from "ps", a summary of the 
number of processes and threads is now printed as well.  We use this for 
monitoring the sanity of user jobs.  For example, we often see jobs that 
run too many threads per process and overload the CPUs.

The tools are:

* psjob <jobid>      for all user processes in a job
* psnode <nodelist>  for all user processes on a node or list of nodes

Download the psjob and psnode tools from:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes

-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark



More information about the slurm-users mailing list