[slurm-users] How to check the percent cpu of a job?
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Wed Nov 21 12:13:50 MST 2018
On 21-11-2018 19:41, Ryan Novosielski wrote:
> Olm’s “pestat” script does allow you to get similar information, but I’m interested to see if indeed there’s a better answer. I’ve used his script for more or less the same reason, to see if the jobs are using the resources they’re allocated. They show at a node level though, and then you have to look closer. For example:
>
> Print only nodes that are flagged by * (RED nodes)
> Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
> State Use/Tot (MB) (MB) JobId User ...
>
> gpu003 oarc drng* 8 12 58.06* 64000 24507 82565618 yc567
> ...
> hal0027 kopp_1 alloc 28 28 8.64* 128000 115610 82591085 mes373 82595703 aek119
>
> You can see, both of the above are examples of jobs that have allocated CPU numbers that are very different from the ultimate CPU load (the first one using way more than allocated, though they’re in a cgroup so theoretically isolated from the other users on the machine), and the second one asking for all 28 CPUs but only “using” ~8 of them.
I have a possible solution with my "psjob" tool which prints a ps
process status on a job's node-list, but excludes system processes:
psjob <jobid>. Requires ClusterShell.
This allows a convenient way to get an overview of the process status of
the job's tasks. Perhaps you could check whether this information is
enough for you?
Download "psjob" (as well as other Slurm job tools) from my page:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
Installation of the ClusterShell prerequisite is described in my Slurm
Wiki pages at
https://wiki.fysik.dtu.dk/niflheim/SLURM#clustershell
/Ole
More information about the slurm-users
mailing list