Hi guys,

We've just setup our new cluster and are facing some issues regading fairshare calculation.
Our slurm directive regarding priority calculation are defines as follows:

PriorityType=priority/multifactor
PriorityFlags=MAX_TRES
PriorityDecayHalfLife=14-0
PriorityFavorSmall=NO
PriorityMaxAge=14-0
PriorityWeightAge=1000
PriorityWeightJobSize=1000
PriorityWeightPartition=10000000
PriorityWeightQOS=10000000
PriorityWeightTRES=CPU=2000,Mem=4000
PriorityWeightFairshare=100000

The partition we are submitinh out jobs to is setup as follows:

PartitionName=mypart Priority=1000 TRESBillingWeights="CPU=1.0,Mem=0.25G" Default=YES MaxTime=96:0:0 DefMemPerCPU=5333 Nodes=node[001-036] MaxNodes=20

Whenever we take a look at the fairshare value using sshare -l we see the following output:

Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS GrpTRESMins TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------
root 1 0.000000 268724597 0.000000 0.000000 cpu=1098201,mem=5856709132,en+
root root 1 0.100000 0 0.000000 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group1 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group2 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group3 1 0.100000 268724597 0.000000 0.000000 cpu=1098201,mem=5856709132,en+
group4 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group5 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group6 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group7 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group8 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
group9 1 0.100000 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+

We think it is really weird that the FairShare value is 0 for the root account and "NULL" for all other groups, even the one who had the greatest raw usage.

While taking a look at the data for our users we see the following:

Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root 1 0.000000 268983721 0.000000
root root 1 0.100000 0 0.000000 0.000000
group3 1 0.100000 268983721 0.000000
group3 user1 1 0.090909 12109374 0.000000 0.000000
group3 user2 1 0.090909 0 0.000000 0.000000
group3 user3 1 0.090909 0 0.000000 0.000000
group3 user4 1 0.090909 0 0.000000 0.000000
group3 user5 1 0.090909 0 0.000000 0.000000
group3 user6 1 0.090909 0 0.000000 0.000000
group3 user7 1 0.090909 0 0.000000 0.000000
group3 user8 1 0.090909 208824597 0.000000 0.000000
group3 user9 1 0.090909 0 0.000000 0.000000
group3 user10 1 0.090909 0 0.000000 0.000000
group3 user11 1 0.090909 48049750 0.000000 0.000000
group4 1 0.100000 0 0.000000
group4 user13 1 0.000000 499452 0.000000 0.000000
group5 1 0.100000 0 0.000000
group5 user14 1 0.000000 1539603 0.000000 0.000000

This is a weird behavior, since user1, user8, user11, user13 and user14 are the ones who have more RawUsage and the FairShare value is the same for all of them, including the users that have no yet submited any job.

We also noticed that in the slurmctld log there is the fillowing error message that appears with some regularity

[2024-03-07T16:38:13.260] error: _append_list_to_array: unable to append NULL list to assoc list.
[2024-03-07T16:38:13.260] error: _calc_tree_fs: unable to calculate fairshare on empty tree

The error above looks like it is coming from: https://github.com/SchedMD/slurm/blob/b11bf689b270f1f5dfe4b0cd54c4fa84b4af315b/src/plugins/priority/multifactor/fair_tree.c#L337

Are we missing any setting on slurm.conf? This is kind of strange, because we have another cluster with pretty much the same configuration and the FairShare is calculated without any problems.
Any help would be appreciated.



-- 
Cumprimentos / Best Regards,
Zacarias Benta

LIP/INCD @ UMINHO
 ----------------------------------------------
/ Use linux, and may the source be with you.  /
----------------------------------------------
                \  __
                -=(o '.
                   '.-.\
                   /|  \\
                   '|  ||
                    _\_):,_