[slurm-users] Statistics on node utilization?
wdennis at nec-labs.com
Wed Oct 16 20:53:38 UTC 2019
We run a few Slurm clusters here, all using SlurmDBD to store job history info. I also utilize Open XDMoD (http://open.xdmod.org/) to run statistics on the jobs. However, it seems that XDMoD does not provide node utilization statistics, unless my XDMoD isn’t configured somehow to do that… What I’m looking for is numbers of jobs landing on which nodes for a period, and things like numbers of completed jobs, failed jobs, etc. per node. What I’m trying to get a sense of is how loaded up (or in my case, most probably, how unused) the individual nodes are in a cluster.
I have run the command:
sacct -X -p -o jobid,jobname,start,end,user,partition%-30,nodelist,alloccpus,reqmem,cputime,qos,state,exitcode,AllocTRES%-50 -S 01/01/19 > sacct-parsable-2019.txt
to get a list of jobs dumped out for the year, sucked it into Excel, and used a PivotTable to get some stats, but that is the long way of doing this… Would like something more dynamic and easier. Anyone have any suggestions?
More information about the slurm-users