Tracking costs - one single pool of credits, variable costs per partition - slurm-users

18 Oct 2024


      We are trying to design the charging and accounting system for our new institutional HPC facility and I'm having difficulty understanding exactly how we can use sacctmgr to achieve what we need.
Until now, our previous HPC facilities have all operated as free delivery and we have not needed to track costs by user/group/project. Account codes have been purely optional.
However, our new facility will be split into various resource types, with free partitions and paid/priority/reserved partitions across those resource types.
All jobs will need to be submitted with an account code.
For users submitting to 'free' partitions we don't need to track resource units against a balance, but the submitted account code would still be used for reporting purposes (i.e. "free resources accounted for % of all use by this project in August-September").
When submitting to a 'paid' partition, the account code needs to be checked to ensure it has a positive balance (or a balance that will not go past some negative threshold).
Each of the 'paid' partitions may (will) have a different resource unit cost. A simple example:
- Submit to a generic CPU paid partition
-- 1 resource unit/token/credit/£/$ per allocated cpu, per hour of compute
- Submit to a high-speed, non-blocking CPU paid partition
-- 2 resource unit/token/credit/£/$ per allocated cpu, per hour of compute
- Submit to a GPU paid partition
-- 4 resource unit/token/credit/£/$ per allocated GPU card, per hour of compute
We need to have *one* pool of resource units/tokens/credits per account - let's say 1000 credits, and a group of users may well decide to spend all of their credits on the generic CPU partition, all on the GPU partition, or some mixture of the two.
So in the above examples, assuming one user (or group of users sharing the same account code) submit a 2 hour job to all three partitions, their one, single account code should be charged:
- 2 units for the generic CPU partition
- 4 units for the job on the low latency partition
- 8 units for the gpu partition. 
- A total of 14 credits removed from their single account code
Is this feasible to achieve without having to allocate credits to each of the partitions for an account, or creating a QOS variant for each and every combination of account and partition?
John Snowdon
Senior Research Infrastructure Engineer (HPC)
Research Software Engineering
Catalyst Building, Room 2.01
Newcastle University
3 Science Square
Newcastle Helix
Newcastle upon Tyne
NE4 5TG
https://rse.ncldata.dev/