[slurm-users] Shard accounting in sreport

Reed Dier reed.dier at focusvq.com
Tue Feb 14 17:12:14 UTC 2023


Hoping someone can tell me if I’m just thinking about this wrong, or if maybe this is somewhere with room for improvement.

I recently upgraded my cluster to 22.05.8 and am testing out gpu sharding on a subset of GPUs, specifically my T4’s.

> -------------------------------------------------------------------------------
> Cluster Utilization 2023-02-13T00:00:00 - 2023-02-13T23:59:59
> Usage reported in Percentage of Total
> -------------------------------------------------------------------------------
>      TRES Name Allocate        Down PLND Dow         Idle  Planned     Reported
> -------------- -------- ----------- -------- ------------ -------- ------------
>    gres/gpu:t4    0.00%       0.00%    0.00%      100.00%    0.00%      100.00%
>     gres/shard   37.06%       0.00%    0.00%       62.94%    0.00%      100.00%

What seems odd to me is that I have shards being consumed, which is implicitly consuming the gpu:t4(s).
However, sreport makes it appear as though the T4’s were completely idle, which is not true.

I know that shards and gpu’s are not a 1:1 allocation, if anything the gpu allocation would almost always be greater than shard allocation.
But in my head that seems like that should be the case, given that the gpu’s are not idle, and in fact allocated, if only “partially.”

I know shards are a new concept and likely will evolve over time, but wanted to see if anyone had run into or thought similarly about this concept.

Reed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230214/1a7add83/attachment-0001.htm>


More information about the slurm-users mailing list