[slurm-users] Accounting configuration

Skouson, Gary gbs35 at psu.edu
Wed Jan 16 16:13:36 UTC 2019


That’s kind of what I’m looking for, but I’d like to modify the partition limit for an account rather than for a user.  Something like:

sacctmgr modify account name=gbstest partition=batch  grpjobs=1

Using sacctmgr to add a partition for a user works fine, unfortunately, partition isn’t one of the options for modifying an account.

Any idea for limiting at the account+partition level rather than account+user+partition?

Setting things for users seems to work as expected, unless I submit a job with multiple partitions.

I have two partitions batch and burst.  I set a  limit of  grpjobs=1 for batch.  I submit jobs to partition “batch,burst” and it starts more than 1 jobs in the batch partition.  I thought the others would need to go into the “burst” partition.

Here’s an example of what I’m seeing.

[gbs35 at sltest ~]$ grep PartitionName=b /etc/slurm/slurm.conf
PartitionName=batch Nodes=ALL Default=no MaxTime=INFINITE State=UP CpuBind=core OverSubscribe=no DenyAccounts=open PriorityTier=100
PartitionName=burst Nodes=ALL Default=no MaxTime=INFINITE State=UP CpuBind=core OverSubscribe=no DenyAccounts=open PriorityTier=50
[gbs35 at sltest ~]$ sacctmgr show account name=gbstest withass format=account,cluster,partition,user,grpcpus,grpjobs
   Account    Cluster  Partition       User  GrpCPUs GrpJobs
---------- ---------- ---------- ---------- -------- -------
   gbstest     sltest
   gbstest     sltest      burst      gbs35
   gbstest     sltest      batch      gbs35                1
[gbs35 at sltest ~]$ squeue -a
       JOBID       USER       PARTITION NODES  CPUS ST  TIME_LEFT START_TIME          NODELIST(R
          91      gbs35     batch,burst     1     1 PD      30:00 2019-01-16T11:19:00 (Nodes req
          90      gbs35     batch,burst     1     1 PD      30:00 2019-01-16T10:49:00 (Nodes req
          89      gbs35     batch,burst     1     1 PD      30:00 2019-01-16T10:19:22 (Nodes req
          88      gbs35           batch     1     1  R      27:37 2019-01-16T09:56:36 sltest
          83      gbs35           batch     1     1  R      20:23 2019-01-16T09:49:22 sltest
          84      gbs35           batch     1     1  R      20:23 2019-01-16T09:49:22 sltest
          85      gbs35           batch     1     1  R      20:23 2019-01-16T09:49:22 sltest
          86      gbs35           batch     1     1  R      20:23 2019-01-16T09:49:22 sltest
          87      gbs35           batch     1     1  R      20:23 2019-01-16T09:49:22 sltest
          81      gbs35           batch     1     1  R      20:20 2019-01-16T09:49:19 sltest
          82      gbs35           batch     1     1  R      20:20 2019-01-16T09:49:19 sltest



Taking a look at another job, it appears that the “limit” info is getting added to the wrong partition for the association for this job.  From slurmctld.log I see:

[2019-01-16T10:41:22.883] debug:  sched: Running job scheduler
[2019-01-16T10:41:22.883] debug2: found 1 usable nodes from config containing sltest
[2019-01-16T10:41:22.883] debug3: _pick_best_nodes: JobId=184 idle_nodes 1 share_nodes 1
[2019-01-16T10:41:22.883] debug2: select_p_job_test for JobId=184
[2019-01-16T10:41:22.883] debug5: powercapping: checking JobId=184 : skipped, capping disabled
[2019-01-16T10:41:22.883] debug3: select/cons_res: _add_job_to_res: JobId=184 act 0
[2019-01-16T10:41:22.883] debug3: select/cons_res: adding JobId=184 to part batch row 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug3: sched: JobId=184 initiated
[2019-01-16T10:41:22.884] sched: Allocate JobId=184 NodeList=sltest #CPUs=1 Partition=batch

You can see that the job is in Partition=batch, but the “acct_policy_job_begin” stuff has the association of (gbstest/gbs35/burst) I would have thought (gbstest/gbs35/batch) would make more sense.  Somewhere the pointer to the correct pointer isn’t making it through.

-----
Gary Skouson


From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Thomas M. Payerle
Sent: Tuesday, January 15, 2019 12:57 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Accounting configuration

Generally, the add, modify, etc sacctmgr
commands want an "user" or "account" entity, but can modify associations though this.

E.g., if user baduser should have GrpTRESmin of cpu=1000 set on partition special, use something like
sacctmgr add user name=baduser partition=special account=testacct grptresmin=cpu=1000
if there is no association for that user, account and partition already, or
sacctmgr modify user where user=baduser partition=special  set grptresmin=cpu=1000

To place the restriction on an account instead, add/modify the account with a partition field.



On Tue, Jan 15, 2019 at 11:33 AM Skouson, Gary <gbs35 at psu.edu<mailto:gbs35 at psu.edu>> wrote:
Slurm accounting info is stored based on user, cluster, partition and account.  I'd like to be able to enforce limits for an account based on the partition it's running in.

Sadly, I'm not seeing how to use sacctmgr to change the partition as part of the association.  The add, modify and delete seem to only apply to user, account and cluster entities.  How do I add a partition to a particular account association, and set GrpTRES for an association that includes a partition.

I know I can change the partition configuration in slurm.conf and use AllowAccounts, but that doesn't change the usage limits on a partition for a particular account.

Maybe there's another way to work around this that I'm missing.

I'd like to be able to use GrpTRESMins to limit overall cumulative account usage. I also want to limit accounts to differing resources (GrpTRES) on some partitions (for preemption/priority etc.)

Thoughts?

-----
Gary Skouson





--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu<mailto:payerle at umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190116/4faf714d/attachment-0001.html>


More information about the slurm-users mailing list