[slurm-users] Noob slurm question

Merritt, Todd R - (tmerritt) tmerritt at email.arizona.edu
Wed Dec 12 14:38:36 MST 2018


Thanks Thomas,
                That's helpful and a bit more tenable than what I thought was going to be required. I have a few additional questions. Based on my reading of the docs, it seems that GrpTRESmin is set on the account and then each user needs to have the partition set there. This brings up a couple of questions for me:


  *   How can an account have multiple time GrpTRESmin values for separate partitions? I'm guessing those have to be separate accounts then?
  *   All of limits that I applied per queue in pbs are all in qos settings in slurm so I could dispense with the additional partitions but I also need to limit some classes of jobs to particular sets of nodes and I didn't see any way to accomplish that besides partitions.


Thanks again!
Todd

From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of "Thomas M. Payerle" <payerle at umd.edu>
Reply-To: Slurm User Community List <slurm-users at lists.schedmd.com>
Date: Wednesday, December 12, 2018 at 1:45 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Noob slurm question

Slurm accounting is based on the notion of "associations".  An association is a set of cluster, partition, allocation account, and user.  I think most sites do the accounting so that it is a single limit applied to all partitions, etc. but you can use sacctmgr to apply limits at any association level.  Normally you would set GrpTRESmin at the required level.  The GrpTRESmin values apply at the association you set them and on all child associations.

So while most sites would do something like e.g.
    set GrpTRESmin=cpu=N for allocation acct Acct1
thereby allowing members of Acct1 to use (as a group) N cpu-minutes combined across all partitions, you could also do something like
    set GrpTRESmin=cpu=N for allocation Acct1 and
    set GrpTRESmin=cpu=A for allocation Acct1 and partitionA
    set GrpTRESmin=cpu=B for allocation Acct1 and partitionB
In this scenario, users of Acct1 can use at most A cpu-min on partitionA and B on paritionB, subject to combined usage on all partitions (A, B, and anything else) does not exceed N.

Underneath the covers, Slurm and PBS accounting behave a bit differently --- IIRC in PBS you assign "credits" to accounts which then get debited as jobs run.  In Slurm, each association tracks usage as jobs run, and you can configure limits on the usage at various levels.

The tools for reporting usage of allocation accounts in Slurm leave something to be desired; sshare is the underlying tool but not very user friendly, and I find sbank leaves a lot to be desired.
I have some Perl libraries interfacing with sshare, etc. on CPAN (ihttps://metacpan.org/pod/Slurm::Sshare<http://metacpan.org/pod/Slurm::Sshare>) which include a basic sbalance command script.  You would likely need to modify the script for your situation (it assumes a situation more like the first example above), but that should not be too bad.



On Wed, Dec 12, 2018 at 1:58 PM Merritt, Todd R - (tmerritt) <tmerritt at email.arizona.edu<mailto:tmerritt at email.arizona.edu>> wrote:
Hi all,
                I'm new to slurm. I've used PBS extensively and have set up an accounting system that gives groups/account a fixed number of hours per month on a per queue/partition basis. It decrements that time allocation with every job run and then resets it to the original value at the start of the next month. I had hoped that slurm would do this natively but it doesn't seem like it does. I came across sbank which sounds like it would implement this but it also seems like it would span partitions and not allow separate limits per partition. Is this something that has already been implemented or could be done in an easier way than what I'm trying?

Thanks,
Todd


--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu<mailto:payerle at umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181212/ddb171eb/attachment-0001.html>


More information about the slurm-users mailing list