And now, a few hours later - with no changes made - everyone has the same fairshare?
$ sshare -l -a Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare GrpTRESMins TRESRunMins -------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ------------------------------ ------------------------------ root 0.000000 63235972 0.000000 1.000000 cpu=188835,mem=1546941371,ene+ root root 1 0.008264 0 0.000000 0.000000 1.000000 cpu=0,mem=0,energy=0,node=0,b+ mic 120 0.991736 63235972 1.000000 1.000000 0.497120 cpu=188835,mem=1546941371,ene+ mic aamedina parent 0.991736 2351906 0.037193 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic aaruldass parent 0.991736 0 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic acataldo parent 0.991736 14637614 0.231476 1.000000 0.497120 cpu=188031,mem=1540350361,ene+ mic achowdhury parent 0.991736 0 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic ajajoo parent 0.991736 2053441 0.032473 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic ajanes parent 0.991736 0 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic amandacao parent 0.991736 200 0.000003 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic aromer parent 0.991736 0 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic aweerasek+ parent 0.991736 1048 0.000017 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic batwood parent 0.991736 0 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic bleng parent 0.991736 3 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic cdemirlek parent 0.991736 6110 0.000097 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+ mic chun parent 0.991736 0 0.000000 1.000000 0.497120 cpu=0,mem=0,energy=0,node=0,b+
I am so confused.
On Aug 10, 2024, at 8:11 AM, Drucker, Daniel DDRUCKER@MCLEAN.HARVARD.EDU wrote:
Hmm, no. That solved the problem of everyone having the same FairShare, but even after restarting slurmd and doing reconfigure, if I submit a job as someone with a huge usage and someone with zero usage, they both end up with the same Priority.
On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker ddrucker@mclean.harvard.edu wrote:
I just set PriorityFlags=NO_FAIR_TREE and this seems to have solved the problem!
On Aug 10, 2024, at 7:45 AM, Drucker, Daniel DDRUCKER@MCLEAN.HARVARD.EDU wrote:
According to https://docs.rc.fas.harvard.edu/kb/fairshare/ and https://slurm.schedmd.com/SUG14/fair_tree.pdf :
"The Fairshare score is calculated using the following formula.f = 2^(-EffectvUsage/NormShares)"
This is clearly not happening on my system:
Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS GrpTRESMins TRESRunMins -------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------ ... mic acataldo parent 0.991736 13066208 0.210193 0.210193 0.983871 cpu=169648,mem=1389757781,ene+ mic achowdhury parent 0.991736 0 0.000000 0.000000 0.983871 cpu=0,mem=0,energy=0,node=0,b+ ...
Every user has 0.991736 NormShares. Acataldo has EffectvUsage = 0.210193 Achowdhury has EffectvUsage = 0
But both users have the same FairShare. The correct values according to the above formula would be 0.863 and 1.0 respectively.
So what's going on?
On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker ddrucker@mclean.harvard.edu wrote:
Here is what is confusing me I guess. Look at the below. You can see that some people have no usage and some people have a lot of usage. But their FairShare value is all identical.
https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd... seems to say that fairshare=parent should work just fine, but what I am seeing is that it is NOT altering people's FairShare?
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline https://www.massgeneralbrigham.org/complianceline . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.