[slurm-users] How to check the percent cpu of a job?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Nov 22 03:09:00 MST 2018


On 11/22/2018 12:10 AM, Christopher Samuel wrote:
> I've just had a quick play with pestat and it reveals that Slurm
> 18.08.3 seems to have some odd ideas about load on nodes, for instance
> one of our KNL nodes that is offline is reported with a CPUload of
> 2.70, but I can see nothing running on it and the load average is
> around 0.1 (which is mostly top).
> 
> Conversely a skylake node that's flat out with a load average of 32
> (all from compute bound processes at 100% CPU) is reported with a
> CPULoad of 2.5.
> 
> The CPULoad is just taken from the output of "sinfo", and I've confirmed
> myself that the numbers are off in that output.

FYI: Here's the sinfo flags which I use in pestat:

# sinfo output: NODELIST PARTITION CPU CPU_LOAD MEMORY FREE_MEM STATE GRES
sinfo -N -o "%N %P %C %O %m %e %t %Z %G"

The CPU_LOAD output should originate from the slurmd daemon running on 
each compute node.  Chris' observations might indicate that slurmd 
version 18.08.3 doesn't show the correct CPU_LOAD numbers.  Our cluster 
runs 17.11.12 and I don't see any such problems!

/Ole



More information about the slurm-users mailing list