[slurm-users] Tracking efficiency of all jobs on the cluster (dashboard etc.)
Tina Friedrich
tina.friedrich at it.ox.ac.uk
Wed Jul 26 16:41:19 UTC 2023
Hi Will,
I don't, currently, although it's on my list.
However, we had a presentation on a recent Oxford HPC-SIG meeting from a
colleague, who implemented a simple job profiler that saves a lot of job
data (including efficiency) & creates plots of the efficiency of the job
run (in a nutshell). We all thought it sounded interesting :)
Code is here: https://github.com/OxfordCBRG/sps
(it's a spank plugin I believe)
Tina
On 24/07/2023 15:37, Will Furnell - STFC UKRI wrote:
> Hello,
>
> I am aware of ‘seff’, which allows you to check the efficiency of a
> single job, which is good for users, but as a cluster administrator I
> would like to be able to track the efficiency of all jobs from all users
> on the cluster, so I am able to ‘re-educate’ users that may be running
> jobs that have terrible resource usage efficiency.
>
> What do other cluster administrators use for this task? Is there
> anything you use and recommend (or don’t recommend) or have heard of
> that is able to do this? Even if it’s something like a Grafana dashboard
> that hooks up to the SLURM database,
>
> Thank you,
>
> Will.
>
More information about the slurm-users
mailing list