Hello,
I'm hoping someone can offer some suggestions.
I went ahead started the database from scratch and reinitialized it to see if that would help and to try and understand how RawUsage is calculated. I ran two jobs of
sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100
With the partition defined as
PriorityFlags=MAX_TRES PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
I expected each job to contribute 6000 to the RawUsage, however one job contributed 3100 and the other 2800. And TRESRunMins stayed at 0 for all categories.
I'm at a loss as to what is going on.
Thank you,
Tyler
Sent with [Proton Mail](https://proton.me/mail/home) secure email.
On Tuesday, September 10th, 2024 at 9:03 PM, tluchko tluchko@protonmail.com wrote:
Hello,
We have a new cluster and I'm trying to setup fairshare accounting. I'm trying to track CPU, MEM and GPU. It seems that billing for individual jobs is correct, but billing isn't being accumulated (TRESRunMin is always 0).
In my slurm.conf, I think the relevant lines are
AccountingStorageType=accounting_storage/slurmdbd AccountingStorageTRES=gres/gpu PriorityFlags=MAX_TRES
PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" I currently have one recently finished job and one running job. sacct gives
$ sacct --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50 JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax
154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1 154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ 155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1155.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1
billing=9 seems correct to me, since I have 1 GPU allocated, which has the largest score of 9.6. However, sshare doesn't show anything in TRESRunMins
sshare --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110 Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins
root 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 abrol_group 2000 0 0.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 luchko_group 2000 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 luchko_group tluchko 1 0.333333 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and slurmdbd is running.
Thank you,
Tyler
Sent with [Proton Mail](https://proton.me/) secure email.