In a 25 node heterogeneous cluster with 4 different types of GPUs, to get granular to see which GPUs were used most over a time period we have to set AccountingStorageTRES to something like: AccountingStorageTRES=gres/gpu,gres/gpu:rtx8000,gres/gpu:v100s,gres/gpu:a40,gres/gpu:a100
Unfortunately it's currently at: AccountingStorageTRES=gres/gpu
At least all nodes have the same GPU within each node. What are some good options to sreport to get details on usage over a year, e.g., percentage of CPU vs GPU, which partitions/accounts used the most GPUs, etc.
From this example: sreport -tminper -t Percent cluster utilization --tres="cpu,gres/gpu" start=2023-07-01 -------------------------------------------------------------------------------- Cluster Utilization 2023-07-01T00:00:00 - 2024-08-15T23:59:59 Usage reported in Percentage of Total -------------------------------------------------------------------------------- Cluster TRES Name Allocated Down PLND Dow Idle Reserved Reported --------- -------------- ----------- ---------- -------- ----------- ---------- ----------- cluster cpu 43.81% 2.87% 0.00% 48.35% 4.97% 99.86% cluster gres/gpu 50.36% 3.59% 0.00% 46.05% 0.00% 100.38%
Is that showing that 50% of all jobs were run with GPUs? How do we read the Idle column? Why does Reported show > 100% for gres?