On Tuesday, September 10th, 2024 at 9:03 PM, tluchko <tluchko@protonmail.com> wrote:
Hello,
We have a new cluster and I'm trying to setup fairshare accounting. I'm trying to track CPU, MEM and GPU. It seems that billing for individual jobs is correct, but billing isn't being accumulated (TRESRunMin is always 0).
In my slurm.conf, I think the relevant lines are
AccountingStorageType=accounting_storage/slurmdbdAccountingStorageTRES=gres/gpu
PriorityFlags=MAX_TRES
PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
I currently have one recently finished job and one running job. sacct gives
$ sacct --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax
------------ ---------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+
155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
155.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1
billing=9 seems correct to me, since I have 1 GPU allocated, which has the largest score of 9.6. However, sshare doesn't show anything in TRESRunMins
sshare --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins
-------------------- ---------- ---------- ---------- ----------- ------------- --------------------------------------------------------------------------------------------------------------
root 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
abrol_group 2000 0 0.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
luchko_group 2000 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
luchko_group tluchko 1 0.333333 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and slurmdbd is running.
Thank you,
Tyler