[slurm-users] How to check the percent cpu of a job?
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Nov 22 03:09:00 MST 2018
On 11/22/2018 12:10 AM, Christopher Samuel wrote:
> I've just had a quick play with pestat and it reveals that Slurm
> 18.08.3 seems to have some odd ideas about load on nodes, for instance
> one of our KNL nodes that is offline is reported with a CPUload of
> 2.70, but I can see nothing running on it and the load average is
> around 0.1 (which is mostly top).
>
> Conversely a skylake node that's flat out with a load average of 32
> (all from compute bound processes at 100% CPU) is reported with a
> CPULoad of 2.5.
>
> The CPULoad is just taken from the output of "sinfo", and I've confirmed
> myself that the numbers are off in that output.
FYI: Here's the sinfo flags which I use in pestat:
# sinfo output: NODELIST PARTITION CPU CPU_LOAD MEMORY FREE_MEM STATE GRES
sinfo -N -o "%N %P %C %O %m %e %t %Z %G"
The CPU_LOAD output should originate from the slurmd daemon running on
each compute node. Chris' observations might indicate that slurmd
version 18.08.3 doesn't show the correct CPU_LOAD numbers. Our cluster
runs 17.11.12 and I don't see any such problems!
/Ole
More information about the slurm-users
mailing list