[slurm-users] Fairshare +FairTree Algorithm + TRESBillingWeights

Yap, Mike M.Yap at massey.ac.nz
Wed Apr 7 00:47:16 UTC 2021


Fix the issue with TRESBillingWeights,
It seems like I will need to set PartitionName for it to work
https://bugs.schedmd.com/show_bug.cgi?id=3753

PartitionName=DEFAULT TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Yap, Mike
Sent: Wednesday, 7 April 2021 9:57 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Fairshare +FairTree Algorithm + TRESBillingWeights

Thanks Luke.. Will go through the  2 commands (will try to digest them)

Wondering if you're able to advise on TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0". Tried to include it in slurm.conf but slurm fail to start

Also wondering if anyone can advise on the fairshare value.  I recall reading a page explaining the how the calculation work (which is quite complicated)
Just confused why the default weight for some parameter is set at difference value ?
In following example, am I correct to believe fairshare priority will play a main role compare to partition weight and age ?
Does this mean, a new job from a new user within the same group will have his job run before a pending  jobs from existing user (queue for 30days) and submitted to a partition with higher prioritytier ?

PriorityType=priority/multifactor
PriorityWeightFairshare=100000
PriorityWeightAge=1000
PriorityWeightPartition=10000
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> On Behalf Of Luke Yeager
Sent: Tuesday, 6 April 2021 3:03 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] Fairshare +FairTree Algorithm + TRESBillingWeights


  *   Rawshare is only a representation of weight where higher value equal to higher priority ?
  *   The total of rawshare need not to be at 100 since it is not percentage?

Look at the output of this command on your cluster and things will probably become more clear:
sshare -a -format=cluster,account,user,partition,rawshare,normshare,normusage,levelfs


  *   Job from User7 will always run before User2 (including those in queue)

No, not "always." It depends on how much each user (actually, user-to-project association) has been utilizing the cluster recently. See "normusage" in the command above. If levelfs is >1, then the priority of the job will be boosted (because they have been "under-served" recently - that's how the manpage puts it).


  *   Is there a command to print out the billing weight is indeed as requested instead of the default cpu ?

Take a look at the output of this command. In particular, check out the value for GrpTRESMins for your "Association Records":
scontrol show assoc_mgr

Hope that helps!
Luke

From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> On Behalf Of Yap, Mike
Sent: Wednesday, March 31, 2021 4:50 PM
To: slurm-users at schedmd.com<mailto:slurm-users at schedmd.com>
Subject: [slurm-users] Fairshare +FairTree Algorithm + TRESBillingWeights

External email: Use caution opening links or attachments

Hi All

Need some clarification on Fairshare (multifactor priority plugin) and FairTree Algorithm

If I read correctly, the current default for slurm is FairTree algorithm in which

  1.  Priority can set on various level
  2.  No fairshare-actual usage is being consider
  3.  Job submitted will run according to Priority instead of fairshare-actual usage
Questions -

  1.  Rawshare is only a representation of weight where higher value equal to higher priority ?
  2.  The total of rawshare need not to be at 100 since it is not percentage?
  3.  Referring to image below, am I right to believe

     *   Job from User7 will always run before User2 (including those in queue)
     *   Job from User3 will always run before User2 (including those in queue)
     *   Job from User7 will always run before User6 (including those in queue)
     *   Job from User3 will always run before User8 (including those in queue)
     *   Job from User2 will always run before User1 (including those in queue)
[cid:image001.png at 01D72BAC.21751710]


Only by configuring multifactor priority plugin, fairshare-usage will be consider in which (referring to above image again) in which it will be common for User1(adhoc user) to have job run before User4(user with massive clocktime)
Questions

  1.  In scenario if both user1 and user2 having the same fairshare value (sshare showing 0.001), do User4 will still have priority over User1 ?

Additional question

  1.  how do we enforce only user define in sacctmgr have the right to submit job ? restarted the services and system but user not in sacctmgr is still able to submit job
  2.  User with two account, how do they define which account to use for submission ?

TRESBillingWeights

If I wish to enable TRESBillingWeight (to use CPU:MEMORY:GPU) - referring to https://slurm.schedmd.com/tres.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Ftres.html&data=04%7C01%7Clyeager%40nvidia.com%7Cdcbc5e7a2ea94213b12a08d8f4a00971%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C637528315492276561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=UQHn2dkXCw5UPFNWgp3yHu2dAH%2BEfhFm6cJ%2BE2QlYAY%3D&reserved=0>


  1.  Do I just include the following line to enable the option

     *   AccountingStorageTRES=gres/gpu
     *   TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"

  1.  Is there a command to print out the billing weight is indeed as requested instead of the default cpu ?

Many thanks
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210407/8cf6bbcd/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 9105 bytes
Desc: image001.png
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210407/8cf6bbcd/attachment-0001.png>


More information about the slurm-users mailing list