[slurm-users] Compute node process monitoring tools updated

Ryan Novosielski novosirj at rutgers.edu
Tue Jan 19 14:31:37 UTC 2021


Thanks, that’s great! I do a lot of that by hand (including lots over this weekend), so it will be a nice timesaver.

--
#BlackLivesMatter
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Jan 18, 2021, at 09:08, Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> wrote:

FYI: My Slurm tools for displaying batch job user process information have been updated.  Besides the user process list from "ps", a summary of the number of processes and threads is now printed as well.  We use this for monitoring the sanity of user jobs.  For example, we often see jobs that run too many threads per process and overload the CPUs.

The tools are:

* psjob <jobid>      for all user processes in a job
* psnode <nodelist>  for all user processes on a node or list of nodes

Download the psjob and psnode tools from:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210119/ad2f1423/attachment.htm>


More information about the slurm-users mailing list