[slurm-users] How to check the percent cpu of a job?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Wed Nov 21 12:13:50 MST 2018


On 21-11-2018 19:41, Ryan Novosielski wrote:
> Olm’s “pestat” script does allow you to get similar information, but I’m interested to see if indeed there’s a better answer. I’ve used his script for more or less the same reason, to see if the jobs are using the resources they’re allocated. They show at a node level though, and then you have to look closer. For example:
> 
> Print only nodes that are flagged by * (RED nodes)
> Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
>                              State Use/Tot              (MB)     (MB)  JobId User ...
> 
>    gpu003            oarc     drng*  8  12   58.06*    64000    24507  82565618 yc567
> ...
>   hal0027          kopp_1    alloc  28  28    8.64*   128000   115610  82591085 mes373 82595703 aek119
> 
> You can see, both of the above are examples of jobs that have allocated CPU numbers that are very different from the ultimate CPU load (the first one using way more than allocated, though they’re in a cgroup so theoretically isolated from the other users on the machine), and the second one asking for all 28 CPUs but only “using” ~8 of them.

I have a possible solution with my "psjob" tool which prints a ps 
process status on a job's node-list, but excludes system processes: 
psjob <jobid>. Requires ClusterShell.

This allows a convenient way to get an overview of the process status of 
the job's tasks.  Perhaps you could check whether this information is 
enough for you?

Download "psjob" (as well as other Slurm job tools) from my page:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs

Installation of the ClusterShell prerequisite is described in my Slurm 
Wiki pages at
https://wiki.fysik.dtu.dk/niflheim/SLURM#clustershell

/Ole



More information about the slurm-users mailing list