[slurm-users] 4 sockets but "
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Jul 23 10:43:35 UTC 2021
Hi Diego,
On 7/23/21 12:36 PM, Diego Zuccato wrote:
>> I believe that slurmd reports the 15 minute CPU load average to the
>> slurmctld, only. So you got this information already.
> Yup. It's just unexpected: if you don't know, you run pestat and see that
> an idle node does have a very high load :)
> My users would think someone is breaking the rules...
Well, Slurm reports the 15-minute load average. I guess users will have
to learn that, because we can't print help information every time.
>> If you run "pestat -F" it will show you (in red color) the nodes where
>> the CPU load is outside the expected range, as given by the number of
>> allocated cores. That covers your situation when 0 CPUs are allocated.
> That's how I noticed it.
Yes, pestat can be quite helpful :-)
>> I'm wondering what information you get from slurmtop, which you're
>> missing from pestat? Maybe an opportunity for improvement :-)
> Well, it shows semi-graphically the CPU allocations for the various jobs,
> so users can tell at a glance if there are useable nodes for their job.
For finding idle nodes, there are better tools:
* sinfo -t idle
* showpartitions (download from
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/partitions)
>> I added a little code to pestat now that calculates the longest hostname
>> (minimum 8, truncated to 20 chars). This is done by querying Slurm with
>> "sinfo -N -O NodeList". Can you try out this new version on your cluster?
>> Download: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
...
> Once fixed, it seems to work OK and columns are aligned. Not the first
> time long names give us problems :( (users are even worse...).
Oops, I fixed this bug in the master branch now, thanks!
/Ole
More information about the slurm-users
mailing list