I'm very confused about priority/fairshare calculation - slurm-users

27 Nov 2024


      I don't know how many times I've read the docs; I keep thinking I understand it, but something is really wrong with prioritisation on our cluster, and we're struggling to understand why.
The setup:
1.  We have a group who submit two types of work; production jobs and research jobs.
  2.  We have two sacctmgr accounts for this; let's call those 'prod' and 'research'.
  3.  We also have some dedicated hardware that they paid for which can be used only by users associated with the prod account.
Desired behaviour:
1.  Usage of their dedicated hardware by production jobs should not hugely decrease the fairshare priority for research jobs in other partitions.
  2.  Usage of shared hardware should decrease their fairshare priority (whether by production or research jobs)
  3.  Memory should make a relatively small contribution to TRES usage (it's not normally the constrained resource)
Our approach:
Set TRESBillingWeights for cpu, memory and gres/GPU usage on shared partitions.  Typically these are set to: CPU=1.0,Mem=0.25G,GRES/gpu=1.0
   Set TRESBillingWeights to something small on the dedicated hardware partition, such as:  CPU=0.25
   Set PriorityWeightFairshare and PriorityWeightAge to values such that Fairshare dominates when jobs are young, and Age takes over if they've been pending a long time
The observed behaviour:
   1.  production association jobs have a high priority; this is working well
   2. research jobs are still getting heavily penalised in fairshare, and we don't understand why; they seem to have enormous RawUsage, largely coming from memory:
Here's what I see from sshare (sensitive details removed, obviously):
sshare -l -A prod, research -a -o Account,RawUsage,EffectvUsage,FairShare,LevelFS,TRESRunMins%80 | grep -v cpu=0
...
'
Account                 RawUsage  EffectvUsage  FairShare    LevelFS                                                                      TRESRunMins
-------------------- ----------- ------------- ---------- ---------- --------------------------------------------------------------------------------
prod                     1587283      0.884373              0.226149 cpu=81371,mem=669457237,energy=0,node=20610,billing=100833,fs/disk=0,vmem=0,pag+
prod                    1082008      0.681681   0.963786   0.366740 cpu=81281,mem=669273429,energy=0,node=20520,billing=100833,fs/disk=0,vmem=0,pag+
prod                     505090      0.318202   0.964027   0.785664 cpu=90,mem=184320,energy=0,node=90,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=+
research              1043560787      0.380577              0.121648 cpu=17181098808,mem=35196566339054,energy=0,node=4295361360,billing=25773481938+
research                 146841      0.000141   0.005311 124.679238 cpu=824,mem=3375923,energy=0,node=824,billing=824,fs/disk=0,vmem=0,pages=0,gres+
research               17530141      0.016798   0.001449   1.044377 cpu=254484,mem=3379938816,energy=0,node=161907,billing=893592,fs/disk=0,vmem=0,+
research                 167597      0.000161   0.005070 109.238498 cpu=7275,mem=223516160,energy=0,node=7275,billing=50931,fs/disk=0,vmem=0,pages=+
research               12712481      0.012182   0.001931   1.440166 cpu=186327,mem=95399526,energy=0,node=23290,billing=232909,fs/disk=0,vmem=0,pag+
research               11521011      0.011040   0.002173   1.589104 cpu=8167,mem=267626086,energy=0,node=8167,billing=65338,fs/disk=0,vmem=0,pages=+
research                9719735      0.009314   0.002414   1.883599 cpu=15020,mem=69214617,energy=0,node=1877,billing=3755,fs/disk=0,vmem=0,pages=0+
research               25004766      0.023961   0.001207   0.732184 cpu=590778,mem=6464600473,energy=0,node=98910,billing=2266887,fs/disk=0,vmem=0,+
research               68938740      0.066061   0.000724   0.265570 cpu=159332,mem=963064985,energy=0,node=89957,billing=192706,fs/disk=0,vmem=0,pa+
research                7359413      0.007052   0.002656   2.487710 cpu=81401,mem=583487624,energy=0,node=20350,billing=20350,fs/disk=0,vmem=0,page+
research              718714430      0.688714   0.000241   0.025473 cpu=20616,mem=337774728,energy=0,node=5154,billing=92772,fs/disk=0,vmem=0,pages+
research                1016606      0.000974   0.003863  18.009010 cpu=17179774580,mem=35184178340113,energy=0,node=4294943645,billing=25769661870+
Firstly, why are the mem TRES numbers so enormous?
Secondly,  what's going on with the last user, where the rawusage is tiny, but the TRESRunMins is ridiculously big?  That could be messing up the whole thing.
Thanks in advance for any advice (either that can help explain what I've misunderstood, or to suggestions of "there's a better way to achieve what you want")
Tim
--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca
________________________________
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com