fairshare=parent sets the user association to effectively compete at the account level, so this is behaving as intended.  It's effectively ignoring the users' usage when competing with others inside the same account.  That is not want you want.  Give them all the same numeric value, not parent.

Fair Tree (the default) handles a single account just fine, but you do not want fairshare=parent there either.

Ryan

On 8/10/24 08:05, Drucker, Daniel via slurm-users wrote:
And now, a few hours later - with no changes made - everyone has the same fairshare?

$ sshare -l -a
Account                    User  RawShares  NormShares    RawUsage   NormUsage  EffectvUsage  FairShare                    GrpTRESMins                    TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ------------------------------ ------------------------------
root                                          0.000000    63235972                  0.000000   1.000000                                cpu=188835,mem=1546941371,ene+
 root                      root          1    0.008264           0    0.000000      0.000000   1.000000                                cpu=0,mem=0,energy=0,node=0,b+
 mic                                   120    0.991736    63235972    1.000000      1.000000   0.497120                                cpu=188835,mem=1546941371,ene+
  mic                  aamedina     parent    0.991736     2351906    0.037193      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                 aaruldass     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                  acataldo     parent    0.991736    14637614    0.231476      1.000000   0.497120                                cpu=188031,mem=1540350361,ene+
  mic                achowdhury     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                    ajajoo     parent    0.991736     2053441    0.032473      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                    ajanes     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                 amandacao     parent    0.991736         200    0.000003      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                    aromer     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                aweerasek+     parent    0.991736        1048    0.000017      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                   batwood     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                     bleng     parent    0.991736           3    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                 cdemirlek     parent    0.991736        6110    0.000097      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                      chun     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+


I am so confused.



On Aug 10, 2024, at 8:11 AM, Drucker, Daniel <DDRUCKER@MCLEAN.HARVARD.EDU> wrote:

Hmm, no. That solved the problem of everyone having the same FairShare, but even after restarting slurmd and doing reconfigure, if I submit a job as someone with a huge usage and someone with zero usage, they both end up with the same Priority.



On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker <ddrucker@mclean.harvard.edu> wrote:

I just set 
PriorityFlags=NO_FAIR_TREE
and this seems to have solved the problem!




On Aug 10, 2024, at 7:45 AM, Drucker, Daniel <DDRUCKER@MCLEAN.HARVARD.EDU> wrote:

According to https://docs.rc.fas.harvard.edu/kb/fairshare/  and https://slurm.schedmd.com/SUG14/fair_tree.pdf :


"The Fairshare score is calculated using the following formula.f = 2^(-EffectvUsage/NormShares)"

This is clearly not happening on my system:

Account                    User  RawShares  NormShares    RawUsage   NormUsage  EffectvUsage  FairShare    LevelFS                    GrpTRESMins                    TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------
...
  mic                  acataldo     parent    0.991736    13066208    0.210193      0.210193   0.983871                                           cpu=169648,mem=1389757781,ene+
  mic                achowdhury     parent    0.991736           0    0.000000      0.000000   0.983871                                           cpu=0,mem=0,energy=0,node=0,b+
...


Every user has 0.991736 NormShares.
Acataldo has EffectvUsage = 0.210193
Achowdhury has EffectvUsage = 0

But both users have the same FairShare. The correct values according to the above formula would be 0.863 and 1.0 respectively.

So what's going on?



On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker <ddrucker@mclean.harvard.edu> wrote:

Here is what is confusing me I guess. Look at the below. You can see that some people have no usage and some people have a lot of usage. But their FairShare value is all identical.

https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd.com/thread/I53OEJSNBT2BMXYVFEFHQQKKAHIUYA53/  seems to say that fairshare=parent should work just fine, but what I am seeing is that it is NOT altering people's FairShare?







The information in this e-mail is intended only for the person to whom it is addressed.  If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .


Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.