And now, a few hours later - with no changes made - everyone has the same fairshare?

$ sshare -l -a
Account                    User  RawShares  NormShares    RawUsage   NormUsage  EffectvUsage  FairShare                    GrpTRESMins                    TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ------------------------------ ------------------------------
root                                          0.000000    63235972                  0.000000   1.000000                                cpu=188835,mem=1546941371,ene+
 root                      root          1    0.008264           0    0.000000      0.000000   1.000000                                cpu=0,mem=0,energy=0,node=0,b+
 mic                                   120    0.991736    63235972    1.000000      1.000000   0.497120                                cpu=188835,mem=1546941371,ene+
  mic                  aamedina     parent    0.991736     2351906    0.037193      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                 aaruldass     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                  acataldo     parent    0.991736    14637614    0.231476      1.000000   0.497120                                cpu=188031,mem=1540350361,ene+
  mic                achowdhury     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                    ajajoo     parent    0.991736     2053441    0.032473      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                    ajanes     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                 amandacao     parent    0.991736         200    0.000003      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                    aromer     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                aweerasek+     parent    0.991736        1048    0.000017      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                   batwood     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                     bleng     parent    0.991736           3    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                 cdemirlek     parent    0.991736        6110    0.000097      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+
  mic                      chun     parent    0.991736           0    0.000000      1.000000   0.497120                                cpu=0,mem=0,energy=0,node=0,b+


I am so confused.



On Aug 10, 2024, at 8:11 AM, Drucker, Daniel <DDRUCKER@MCLEAN.HARVARD.EDU> wrote:

Hmm, no. That solved the problem of everyone having the same FairShare, but even after restarting slurmd and doing reconfigure, if I submit a job as someone with a huge usage and someone with zero usage, they both end up with the same Priority.



On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker <ddrucker@mclean.harvard.edu> wrote:

I just set 
PriorityFlags=NO_FAIR_TREE
and this seems to have solved the problem!




On Aug 10, 2024, at 7:45 AM, Drucker, Daniel <DDRUCKER@MCLEAN.HARVARD.EDU> wrote:

According to https://docs.rc.fas.harvard.edu/kb/fairshare/  and https://slurm.schedmd.com/SUG14/fair_tree.pdf :


"The Fairshare score is calculated using the following formula.f = 2^(-EffectvUsage/NormShares)"

This is clearly not happening on my system:

Account                    User  RawShares  NormShares    RawUsage   NormUsage  EffectvUsage  FairShare    LevelFS                    GrpTRESMins                    TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------
...
  mic                  acataldo     parent    0.991736    13066208    0.210193      0.210193   0.983871                                           cpu=169648,mem=1389757781,ene+
  mic                achowdhury     parent    0.991736           0    0.000000      0.000000   0.983871                                           cpu=0,mem=0,energy=0,node=0,b+
...


Every user has 0.991736 NormShares.
Acataldo has EffectvUsage = 0.210193
Achowdhury has EffectvUsage = 0

But both users have the same FairShare. The correct values according to the above formula would be 0.863 and 1.0 respectively.

So what's going on?



On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker <ddrucker@mclean.harvard.edu> wrote:

Here is what is confusing me I guess. Look at the below. You can see that some people have no usage and some people have a lot of usage. But their FairShare value is all identical.

https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd.com/thread/I53OEJSNBT2BMXYVFEFHQQKKAHIUYA53/  seems to say that fairshare=parent should work just fine, but what I am seeing is that it is NOT altering people's FairShare?







The information in this e-mail is intended only for the person to whom it is addressed.  If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .


Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.