Hello,

I'm hoping someone can offer some suggestions.

I went ahead started the database from scratch and reinitialized it to see if that would help and to try and understand how RawUsage is calculated.  I ran two jobs of

sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100

With the partition defined as 

PriorityFlags=MAX_TRES
PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"

I expected each job to contribute 6000 to the RawUsage, however one job contributed 3100 and the other 2800.  And TRESRunMins stayed at 0 for all categories.

I'm at a loss as to what is going on.

Thank you,

Tyler

Sent with Proton Mail secure email.

On Tuesday, September 10th, 2024 at 9:03 PM, tluchko <tluchko@protonmail.com> wrote:
Hello,

We have a new cluster and I'm trying to setup fairshare accounting.  I'm trying to track CPU, MEM and GPU.  It seems that billing for individual jobs is correct, but billing isn't being accumulated (TRESRunMin is always 0).

In my slurm.conf, I think the relevant lines are

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu

PriorityFlags=MAX_TRES

PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"

I currently have one recently finished job and one running job.  sacct gives

$ sacct --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50
JobID           JobName                                            ReqTRES                                          AllocTRES                                     TRESUsageInAve                                     TRESUsageInMax
------------ ---------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
154          interacti+           billing=9,cpu=1,gres/gpu=1,mem=1G,node=1           billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
154.interac+ interacti+                                                                        cpu=2,gres/gpu=1,mem=2G,node=1 cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+
155          interacti+           billing=9,cpu=1,gres/gpu=1,mem=1G,node=1           billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
155.interac+ interacti+                                                                        cpu=2,gres/gpu=1,mem=2G,node=1

billing=9 seems correct to me, since I have 1 GPU allocated, which has the largest score of 9.6.  However, sshare doesn't show anything in TRESRunMins

sshare --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110
Account                    User  RawShares  FairShare    RawUsage  EffectvUsage                                                                                                    TRESRunMins
-------------------- ---------- ---------- ---------- ----------- ------------- --------------------------------------------------------------------------------------------------------------
root                                                     21589714      1.000000         cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
 abrol_group                          2000                      0      0.000000         cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
 luchko_group                         2000               21589714      1.000000         cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
  luchko_group          tluchko          1   0.333333    21589714      1.000000         cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0

Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and slurmdbd is running.

Thank you,

Tyler
Sent with Proton Mail secure email.