[slurm-users] Accounting configuration
Skouson, Gary
gbs35 at psu.edu
Wed Jan 16 16:13:36 UTC 2019
That’s kind of what I’m looking for, but I’d like to modify the partition limit for an account rather than for a user. Something like:
sacctmgr modify account name=gbstest partition=batch grpjobs=1
Using sacctmgr to add a partition for a user works fine, unfortunately, partition isn’t one of the options for modifying an account.
Any idea for limiting at the account+partition level rather than account+user+partition?
Setting things for users seems to work as expected, unless I submit a job with multiple partitions.
I have two partitions batch and burst. I set a limit of grpjobs=1 for batch. I submit jobs to partition “batch,burst” and it starts more than 1 jobs in the batch partition. I thought the others would need to go into the “burst” partition.
Here’s an example of what I’m seeing.
[gbs35 at sltest ~]$ grep PartitionName=b /etc/slurm/slurm.conf
PartitionName=batch Nodes=ALL Default=no MaxTime=INFINITE State=UP CpuBind=core OverSubscribe=no DenyAccounts=open PriorityTier=100
PartitionName=burst Nodes=ALL Default=no MaxTime=INFINITE State=UP CpuBind=core OverSubscribe=no DenyAccounts=open PriorityTier=50
[gbs35 at sltest ~]$ sacctmgr show account name=gbstest withass format=account,cluster,partition,user,grpcpus,grpjobs
Account Cluster Partition User GrpCPUs GrpJobs
---------- ---------- ---------- ---------- -------- -------
gbstest sltest
gbstest sltest burst gbs35
gbstest sltest batch gbs35 1
[gbs35 at sltest ~]$ squeue -a
JOBID USER PARTITION NODES CPUS ST TIME_LEFT START_TIME NODELIST(R
91 gbs35 batch,burst 1 1 PD 30:00 2019-01-16T11:19:00 (Nodes req
90 gbs35 batch,burst 1 1 PD 30:00 2019-01-16T10:49:00 (Nodes req
89 gbs35 batch,burst 1 1 PD 30:00 2019-01-16T10:19:22 (Nodes req
88 gbs35 batch 1 1 R 27:37 2019-01-16T09:56:36 sltest
83 gbs35 batch 1 1 R 20:23 2019-01-16T09:49:22 sltest
84 gbs35 batch 1 1 R 20:23 2019-01-16T09:49:22 sltest
85 gbs35 batch 1 1 R 20:23 2019-01-16T09:49:22 sltest
86 gbs35 batch 1 1 R 20:23 2019-01-16T09:49:22 sltest
87 gbs35 batch 1 1 R 20:23 2019-01-16T09:49:22 sltest
81 gbs35 batch 1 1 R 20:20 2019-01-16T09:49:19 sltest
82 gbs35 batch 1 1 R 20:20 2019-01-16T09:49:19 sltest
Taking a look at another job, it appears that the “limit” info is getting added to the wrong partition for the association for this job. From slurmctld.log I see:
[2019-01-16T10:41:22.883] debug: sched: Running job scheduler
[2019-01-16T10:41:22.883] debug2: found 1 usable nodes from config containing sltest
[2019-01-16T10:41:22.883] debug3: _pick_best_nodes: JobId=184 idle_nodes 1 share_nodes 1
[2019-01-16T10:41:22.883] debug2: select_p_job_test for JobId=184
[2019-01-16T10:41:22.883] debug5: powercapping: checking JobId=184 : skipped, capping disabled
[2019-01-16T10:41:22.883] debug3: select/cons_res: _add_job_to_res: JobId=184 act 0
[2019-01-16T10:41:22.883] debug3: select/cons_res: adding JobId=184 to part batch row 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, qos normal grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug3: sched: JobId=184 initiated
[2019-01-16T10:41:22.884] sched: Allocate JobId=184 NodeList=sltest #CPUs=1 Partition=batch
You can see that the job is in Partition=batch, but the “acct_policy_job_begin” stuff has the association of (gbstest/gbs35/burst) I would have thought (gbstest/gbs35/batch) would make more sense. Somewhere the pointer to the correct pointer isn’t making it through.
-----
Gary Skouson
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Thomas M. Payerle
Sent: Tuesday, January 15, 2019 12:57 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Accounting configuration
Generally, the add, modify, etc sacctmgr
commands want an "user" or "account" entity, but can modify associations though this.
E.g., if user baduser should have GrpTRESmin of cpu=1000 set on partition special, use something like
sacctmgr add user name=baduser partition=special account=testacct grptresmin=cpu=1000
if there is no association for that user, account and partition already, or
sacctmgr modify user where user=baduser partition=special set grptresmin=cpu=1000
To place the restriction on an account instead, add/modify the account with a partition field.
On Tue, Jan 15, 2019 at 11:33 AM Skouson, Gary <gbs35 at psu.edu<mailto:gbs35 at psu.edu>> wrote:
Slurm accounting info is stored based on user, cluster, partition and account. I'd like to be able to enforce limits for an account based on the partition it's running in.
Sadly, I'm not seeing how to use sacctmgr to change the partition as part of the association. The add, modify and delete seem to only apply to user, account and cluster entities. How do I add a partition to a particular account association, and set GrpTRES for an association that includes a partition.
I know I can change the partition configuration in slurm.conf and use AllowAccounts, but that doesn't change the usage limits on a partition for a particular account.
Maybe there's another way to work around this that I'm missing.
I'd like to be able to use GrpTRESMins to limit overall cumulative account usage. I also want to limit accounts to differing resources (GrpTRES) on some partitions (for preemption/priority etc.)
Thoughts?
-----
Gary Skouson
--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads payerle at umd.edu<mailto:payerle at umd.edu>
5825 University Research Park (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190116/4faf714d/attachment-0001.html>
More information about the slurm-users
mailing list