Hi Davide,
Thanks, I appreciate your positive feedback! Some comments are below:
On 21-08-2024 15:07, Davide DelVento wrote:
Thanks, Ole! Your tools and what you do for the community is fantastic, we all appreciate you!
Of course, I did look (and use) your script. But I need more info.
And no, this is not something that users would run *ever* (let alone at every login). This is something I *myself* (the cluster administrator) need to run, once a quarter, or perhaps even just once a year, to inform my managers of cluster utilization to keep them apprised on the status of the affairs, and justify change in funding for future hardware purchases. Sorry for not making this clear, given the initial message I replied to.
...
> What I am still unable to get is: > > - utilization by queue (or list of node names), to track actual use of expensive resources such as GPUs, high memory nodes, etc
The slurmacct script can actually break down statistics by partition, which I guess is what you're asking for? The usage of the command is:
# slurmacct -h Usage: slurmacct [-s Start_time -e End_time | -c | -w | -m monthyear] [-p partition(s)] [-u username] [-g groupname] [-G] [-W workdir] [-r report-prefix] [-n] [-h] where: -s Start_time [last month]: Starting time of accounting period. -e End_time [last month]: End time of accounting period. -c: Current month -w: Last week -m monthyear: Select month and year (like "november2019") -p partition(s): Select only Slurm partion <partition>[,partition2,...] -u username: Print only user <username> -g groupname: Print only users in UNIX group <groupname> -G: Print only groupwise summed accounting data -W directory: Print only jobs with this string in the job WorkDir -r: Report name prefix -n: No header information is printed -h: Print this help information
The Start_time and End_time values specify the date/time interval of job completion/termination (see "man sacct").
Hint: Specify Start/End time as MMDD (Month and Date)
> - statistics about wait-in-queue for jobs, due to unavailable resources
The slurmacct report prints "Average q-hours" (starttime minus submittime).
> hopefully both in a sreport-like format by user and by overall system > > I suspect this information is available in sacct, but needs some > massaging/consolidation to become useful for what I am looking for. > Perhaps either (or both) of your scripts already do that in some place > that I did not find? That would be terrific, and I'd appreciate it if you > can point me to its place.
We use the "topreports" script to gather weekly, monthly and yearly reports (using slurmacct) for management (professors at our university).
IHTH, Ole