[slurm-users] New Billing TRES Issue

Thomas M. Payerle payerle at umd.edu
Fri Apr 27 10:31:17 MDT 2018


I have not had a chance to play with the newest Slurm, but I would suggest
looking at GrpTRESRaw, which is supposed to gather the usage by TRES (in
TRES-minutes).
So if there is a billing TRES in GrpTRESRaw, that might be what you want.

On Fri, Apr 27, 2018 at 11:21 AM, Roberts, John E. <jeroberts at anl.gov>
wrote:

> Hi,
>
> I'm testing the newest version of Slurm and I'm seeing an issue when using
> the newer billing TRES to charge for cpu time on a partition. I've seen
> that billing should be used now instead of cpu in order to properly use the
> "TRESBillingWeights" option on a partition.
>
> In my test case, I gave an account 2 hours of billing time. I used 1 hour
> of this while setting the partition to TRESBillingWeights="CPU=1.0". It
> seemed to have billed properly.
> Next, I set on the same partition TRESBillingWeights="CPU=0.5". I ran
> several jobs, but the billing never seemed to increase. RawUsage, however,
> did increment correctly.
>
> Here's an examples of sshare reporting no billing run minutes, when
> CPU=0.5 and I start a job with a walltime of 1 hour. Even though the
> RawUsage is well past 2 hours, a job can still run when it shouldn't.
>
> # sshare -A test -l -o RawUsage,GrpTRESMins,TRESRunMins%60
>    RawUsage                    GrpTRESMins
>                   TRESRunMins
> ----------- ------------------------------
> -----------------------------------------------------
>       11068                    billing=120
> cpu=60,mem=0,energy=0,node=60,billing=0
>
> If I set CPU=1.0 and start say a job for 2 hours, I get this in the logs:
> debug2: Job 32 being held, the job is at or exceeds assoc
> 239(test/(null)/(null)) group max tres(billing) minutes of 120 of which 60
> are still available but request is for 120 (plus 0 already in use) tres
> minutes (request tres count 1)
>
> This makes sense because I previously ran a job at the weight of 1.0 for
> an hour so it "billed" for 1 hour at that time. How can I query the
> "available" billing hours if it's not RawUsage?
>
> Going back to setting billing CPU weight to 0.5, the logs seem to be
> inconsistent too. In this first line, it shows the right thing:
> debug:  TRES Weight: cpu = 1.000000 * 0.500000 = 0.500000
>
> but not a few lines down:
> debug2: acct_policy_job_begin: after adding job 45, assoc
> 239(test/(null)/(null)) grp_used_tres_run_secs(billing) is 0
>
> Again, RawUsage increases correctly, but Slurm is using some other field
> for billing to determine if a job can run.
>
> My questions are: How can I set CPU billing to be less than 1 and how can
> I make sure jobs don't run if they are out of time in this case? What is
> Slurm using for billing, because it's clearly not RawUsage? Am I simply
> misunderstanding the billing and/or weights fields?
>
> Thanks for any help...
>
>


-- 
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180427/3c61aa64/attachment.html>


More information about the slurm-users mailing list