In a 25 node heterogeneous cluster with 4 different types of GPUs, to get
granular to see which GPUs were used most over a time period we have to set
AccountingStorageTRES to something like:
AccountingStorageTRES=gres/gpu,gres/gpu:rtx8000,gres/gpu:v100s,gres/gpu:a40,gres/gpu:a100
Unfortunately it's currently at:
AccountingStorageTRES=gres/gpu
At least all nodes have the same GPU within each node. What are some good
options to sreport to get details on usage over a year, e.g., percentage of
CPU vs GPU, which partitions/accounts used the most GPUs, etc.
From this example:
sreport -tminper -t Percent cluster utilization --tres="cpu,gres/gpu"
start=2023-07-01
--------------------------------------------------------------------------------
Cluster Utilization 2023-07-01T00:00:00 - 2024-08-15T23:59:59
Usage reported in Percentage of Total
--------------------------------------------------------------------------------
Cluster TRES Name Allocated Down PLND Dow Idle
Reserved Reported
--------- -------------- ----------- ---------- -------- -----------
---------- -----------
cluster cpu 43.81% 2.87% 0.00% 48.35%
4.97% 99.86%
cluster gres/gpu 50.36% 3.59% 0.00% 46.05%
0.00% 100.38%
Is that showing that 50% of all jobs were run with GPUs? How do we read the
Idle column? Why does Reported show > 100% for gres?