[slurm-users] New Billing TRES Issue

Roberts, John E. jeroberts at anl.gov
Fri Apr 27 12:16:59 MDT 2018


Thank you! That finally tells me where the real billing minutes (and also the used cpu minutes that I was using to bill in my older version of slurm) are stored. I’m not sure how I was to even know GrpTRESRaw is supposed to exist... I see no mention of it unless I specifically query it or in the docs. That’s a great start.

# sshare -A test -l -o GrpTRESRaw%70

Now I have something to query to present to the users in the future.
So now the issue remains on why I can’t use decimals to bill for time…

From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of "Thomas M. Payerle" <payerle at umd.edu>
Reply-To: Slurm User Community List <slurm-users at lists.schedmd.com>
Date: Friday, April 27, 2018 at 11:39 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] New Billing TRES Issue

I have not had a chance to play with the newest Slurm, but I would suggest looking at GrpTRESRaw, which is supposed to gather the usage by TRES (in TRES-minutes).
So if there is a billing TRES in GrpTRESRaw, that might be what you want.

On Fri, Apr 27, 2018 at 11:21 AM, Roberts, John E. <jeroberts at anl.gov<mailto:jeroberts at anl.gov>> wrote:

I'm testing the newest version of Slurm and I'm seeing an issue when using the newer billing TRES to charge for cpu time on a partition. I've seen that billing should be used now instead of cpu in order to properly use the "TRESBillingWeights" option on a partition.

In my test case, I gave an account 2 hours of billing time. I used 1 hour of this while setting the partition to TRESBillingWeights="CPU=1.0". It seemed to have billed properly.
Next, I set on the same partition TRESBillingWeights="CPU=0.5". I ran several jobs, but the billing never seemed to increase. RawUsage, however, did increment correctly.

Here's an examples of sshare reporting no billing run minutes, when CPU=0.5 and I start a job with a walltime of 1 hour. Even though the RawUsage is well past 2 hours, a job can still run when it shouldn't.

# sshare -A test -l -o RawUsage,GrpTRESMins,TRESRunMins%60
   RawUsage                    GrpTRESMins                                                  TRESRunMins
----------- ------------------------------        -----------------------------------------------------
      11068                    billing=120                      cpu=60,mem=0,energy=0,node=60,billing=0

If I set CPU=1.0 and start say a job for 2 hours, I get this in the logs:
debug2: Job 32 being held, the job is at or exceeds assoc 239(test/(null)/(null)) group max tres(billing) minutes of 120 of which 60 are still available but request is for 120 (plus 0 already in use) tres minutes (request tres count 1)

This makes sense because I previously ran a job at the weight of 1.0 for an hour so it "billed" for 1 hour at that time. How can I query the "available" billing hours if it's not RawUsage?

Going back to setting billing CPU weight to 0.5, the logs seem to be inconsistent too. In this first line, it shows the right thing:
debug:  TRES Weight: cpu = 1.000000 * 0.500000 = 0.500000

but not a few lines down:
debug2: acct_policy_job_begin: after adding job 45, assoc 239(test/(null)/(null)) grp_used_tres_run_secs(billing) is 0

Again, RawUsage increases correctly, but Slurm is using some other field for billing to determine if a job can run.

My questions are: How can I set CPU billing to be less than 1 and how can I make sure jobs don't run if they are out of time in this case? What is Slurm using for billing, because it's clearly not RawUsage? Am I simply misunderstanding the billing and/or weights fields?

Thanks for any help...

Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu<mailto:payerle at umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180427/4a9f7f66/attachment.html>

More information about the slurm-users mailing list