[slurm-users] Compute node process monitoring tools updated

Alan Orth alan.orth at gmail.com
Tue Jan 19 14:28:54 UTC 2021


Thank you for that, Ole! I will give them a spin on our cluster and send
any feedback to GitHub.

Cheers,

On Mon, Jan 18, 2021 at 4:12 PM Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
wrote:

> FYI: My Slurm tools for displaying batch job user process information have
> been updated.  Besides the user process list from "ps", a summary of the
> number of processes and threads is now printed as well.  We use this for
> monitoring the sanity of user jobs.  For example, we often see jobs that
> run too many threads per process and overload the CPUs.
>
> The tools are:
>
> * psjob <jobid>      for all user processes in a job
> * psnode <nodelist>  for all user processes on a node or list of nodes
>
> Download the psjob and psnode tools from:
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes
>
> --
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark
>
>

-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210119/87ba1a0e/attachment.htm>


More information about the slurm-users mailing list