[slurm-users] Account Usage Discrepancies
roberts.johneric at gmail.com
Mon Nov 27 15:06:28 MST 2017
Hoping someone will get eyes on this one. I ended up changing the partition
in question to only use 1 thread per core to keep things simple, but it
would still be nice to know why slurm is looking at TRES hours instead of
On Wed, Nov 15, 2017 at 10:55 AM, John Roberts <roberts.johneric at gmail.com>
> I'm having an issue with accounts in slurm and not sure if I'm missing
> something. Here's a quick breakdown of the issue:
> We have many accounts in Slurm (v16.05.10) / SlurmDBD. We recently set 1
> partition's billing weight to 0.25. This partition has 64 cores with 4
> threads per node. We set this weight to 0.25 so we don't bill for threads,
> just core hours. This part seems to be working ok.
> When querying the account balance via RawUsage (and we use sbank to
> present this to the user in readable hours), these numbers look right. They
> come out to a quarter of full node.
> However, when querying say "UserUtilizationByAccount", this number is
> about 4 times as much. This also makes sense because they are technically
> being allocated for all cores and threads, but we only expect to bill for a
> quarter of the time.
> The problem arose when a user of this account tried to submit a job and it
> sat in the queue with the error "AssocGrpCPUMinutesLimit".
> Turning up the debug logs showed this:
> "debug2: Job 161868 being held, the job is at or exceeds assoc
> 2159(<foo>/(null)/(null)) group max tres(cpu) minutes of 150000000 of which
> 27718972 are still available but request is for 94371840 (plus 0 already in
> use) tres minutes (request tres count 65536)"
> The available number above "27718972" matches what the balance would have
> been from the max CPU minutes minus the usage from
> "UserUtilizationByAccount" instead of reporting the real balance of 4x that
> Why would Slurm be trying to schedule jobs based on this number instead of
> RawUsage? If we're billing it lower, RawUsage should be the true balance,
> but that doesn't seem to be the case.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users