[slurm-users] How to check the percent cpu of a job?

Christopher Samuel chris at csamuel.org
Wed Nov 21 16:10:25 MST 2018


On 22/11/18 5:41 am, Ryan Novosielski wrote:

> You can see, both of the above are examples of jobs that have
> allocated CPU numbers that are very different from the ultimate CPU
> load (the first one using way more than allocated, though they’re in
> a cgroup so theoretically isolated from the other users on the
> machine), and the second one asking for all 28 CPUs but only “using”
> ~8 of them.

I've just had a quick play with pestat and it reveals that Slurm
18.08.3 seems to have some odd ideas about load on nodes, for instance
one of our KNL nodes that is offline is reported with a CPUload of
2.70, but I can see nothing running on it and the load average is
around 0.1 (which is mostly top).

Conversely a skylake node that's flat out with a load average of 32
(all from compute bound processes at 100% CPU) is reported with a
CPULoad of 2.5.

The CPULoad is just taken from the output of "sinfo", and I've confirmed
myself that the numbers are off in that output.

> If you’re using cgroups, it would seem to me that there must also be
> a way to see the output of “top” for just a group, or at least
> something similar. systemd-cgtop does more or less that, but doesn’t
> seem to show exactly what you’d want here:
[...]
 > ...CPU only being shown as an aggregate at the top level

If you run:

systemd-cgtop -c

it will sort by CPU usage and be more useful! :-)

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



More information about the slurm-users mailing list