[slurm-users] slurm bank and sreport tres minute usage problem

Fri Mar 12 19:25:09 UTC 2021

Very new to SLURM and have not used sreport before so I decided to
try your searches myself to see what they do.

I am running 20.11.3 and it seems to match the data for me for a very
simple case I tested that I could "eyeball"

Looking just at the day 2021-03-09 for user mu40 on account lcn

# sreport -t minutes -T CPU -nP cluster \
   AccountUtilizationByUser start='2021-03-09' end='2021-03-10' \
   account=lcn format=login,used
|40333
cx88|33835
mu40|6498

# sreport -t minutes -T gres/gpu -nP cluster \
   AccountUtilizationByUser start='2021-03-09' end='2021-03-10' \
   account=lcn format=login,used
|13070
cx88|9646
mu40|3425

# sacct --user=mu40 --starttime=2021-03-09 --endtime=2021-03-10 \
   --account=lcn -o jobid,start,end,elapsed,alloctres%80

        JobID               Start                 End    Elapsed
AllocTRES
------------ ------------------- ------------------- ----------
-----------------------------------------------------
190682       2021-03-05T16:25:55 2021-03-12T09:20:52 6-16:54:57
billing=10,cpu=3,gres/gpu=2,mem=24G,node=1
190682.batch 2021-03-05T16:25:55 2021-03-12T09:20:53 6-16:54:58
cpu=3,gres/gpu=2,mem=24G,node=1
190682.exte+ 2021-03-05T16:25:55 2021-03-12T09:20:52 6-16:54:57
billing=10,cpu=3,gres/gpu=2,mem=24G,node=1
201123       2021-03-09T14:55:20 2021-03-09T14:55:23   00:00:03
billing=9,cpu=4,gres/gpu=1,mem=96G,node=1
201123.exte+ 2021-03-09T14:55:20 2021-03-09T14:55:23   00:00:03
billing=9,cpu=4,gres/gpu=1,mem=96G,node=1
201123.0     2021-03-09T14:55:20 2021-03-09T14:55:23   00:00:03
cpu=4,gres/gpu=1,mem=96G,node=1
201124       2021-03-09T14:55:29 2021-03-10T08:13:07   17:17:38
billing=18,cpu=4,gres/gpu=1,mem=512G,node=1
201124.exte+ 2021-03-09T14:55:29 2021-03-10T08:13:07   17:17:38
billing=18,cpu=4,gres/gpu=1,mem=512G,node=1
201124.0     2021-03-09T14:55:29 2021-03-10T08:13:07   17:17:38
cpu=4,gres/gpu=1,mem=512G,node=1

So the first job used all 24 hours of that day, the 2nd just 3 seconds
(so ignore it) and the third about 9 hours and 5 minutes

CPU = 24*60*3+(9*60+5)*4 = 6500

GPU = 24*60*2+(9*60+5)*1 = 3425

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

On Thu, 11 Mar 2021 11:03pm, Miguel Oliveira wrote:

> Dear all,
>
> Hope you can help me!
> In our facility we support the users via projects that have time allocations. Given this we use a simple bank facility developed by us along the ideas of the old code https://jcftang.github.io/slurm-bank/ <https://jcftang.github.io/slurm-bank/>.
> Our implementation differs because we have a QOS per project with a NoDecay flag. The basic commands used are:
> - scontrol show assoc_mgr to read the limits,
> - sacctmgr modify qos to modify the limits and,
> - sreport to read individual usage.
> We have been using this for a while in production without any single issues for CPU time allocations.
>
> Now we need to implement GPU time allocation as well for our new GPU partition.
> While the 2 first commands work fine to set or change the limits with gres/gpu we seem to get values with sreport that do not add up.
> In this case we use:
>
> - command='sreport -t minutes -T gres/gpu -nP cluster AccountUtilizationByUser start='+date_start+' end='+date_end+' account='+account+' format=login,used'
>
> We have confirmed via the accounting records that the total reported via scontrol show assoc_mgr is correct while the value given by sreport is totally off.
> Did I misunderstand the sreport man page and the command above is reporting something else or is this a bug?
> We do something similar with "-T cpu", for the CPU part of the code, and the number match up. We are using slurm 20.02.0.
>
> Best Regards,
>
> MAO
>
> ---
> Miguel Afonso Oliveira
> Laboratório de Computação Avançada | Laboratory for Advanced Computing
> Universidade de Coimbra | University of Coimbra
> T: +351239410681
> E: miguel.oliveira at uc.pt
> W: www.uc.pt/lca
>
>
>
>