[slurm-users] scontrol show assoc_mgr showing more resources in use than squeue

Renfro, Michael Renfro at tntech.edu
Sat May 9 13:12:25 UTC 2020


Still observing, but it looks like clearing out the runaway jobs followed by restarting slurmdbd got my user up to 986 CPU-days remaining out of their allowed 1000. Not certain the runaways were related, but it definitely started behaving better after a late afternoon/early evening slurmdbd restart.

Thanks.

> On May 8, 2020, at 11:47 AM, Renfro, Michael <Renfro at tntech.edu> wrote:
> 
> Working on something like that now. From an SQL export, I see 16 jobs from my user that have a state of 7. Both states 3 and 7 show up as COMPLETED in sacct, and may also have some duplicate job entries found via sacct --duplicates.
> 
>> On May 8, 2020, at 11:34 AM, Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> wrote:
>> 
>> Hi Michael,
>> 
>> You can inquire the database for a job summary of a particular user and
>> time period using the slurmacct command:
>> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmacct
>> 
>> You can also call "sacct --user=USER" directly like in slurmacct:
>> 
>> # Request job data
>> export
>> FORMAT="JobID,User${ulen},Group${glen},Partition,AllocNodes,AllocCPUS,Submit,Eligible,Start,End,CPUTimeRAW,State"
>> # Request job states: Cancelled, Completed, Failed, Timeout, Preempted
>> export STATE="ca,cd,f,to,pr"
>> # Get Slurm individual job accounting records using the "sacct" command
>> sacct $partitionselect -n -X -a -S $start_time -E $end_time -o $FORMAT
>> -s $STATE
>> 
>> There are numerous output fields which you can inquire, see "sacct -e".
>> 
>> /Ole
>> 
>> 
>>>> On 08-05-2020 16:54, Renfro, Michael wrote:
>>> Slurm 19.05.3 (packaged by Bright). For the three running jobs, the
>>> total GrpTRESRunMins requested is 564480 CPU-minutes as shown by
>>> 'showjob', and their remaining usage that the limit would check against
>>> is less than that.
>>> 
>>> My download of your scripts dated to August 21, 2019, and I've just now
>>> done a clone of your repository to see if there were any differences.
>>> None that I see -- 'showuserlimits -u USER -A ACCOUNT -s cpu' returns
>>> "Limit = 1440000, current value = 1399895".
>>> 
>>> So I assume there's something lingering in the database from some jobs
>>> that already completed, but still get counted against the user's current
>>> requests.
>>> 
>>> ------------------------------------------------------------------------
>>> *From:* Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
>>> *Sent:* Friday, May 8, 2020 9:27 AM
>>> *To:* slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
>>> *Cc:* Renfro, Michael <Renfro at tntech.edu>
>>> *Subject:* Re: [slurm-users] scontrol show assoc_mgr showing more
>>> resources in use than squeue
>>> Hi Michael,
>>> 
>>> Yes, my Slurm tools use and trust the output of Slurm commands such as
>>> sacct, and any discrepancy would have to come from the Slurm database.
>>> Which version of Slurm are you running on the database server and the
>>> node where you run sacct?
>>> 
>>> Did you add up the GrpTRESRunMins values of all the user's running jobs?
>>>  They had better add up to current value = 1402415.  The "showjob"
>>> command prints #CPUs and time limit in minutes, so you need to multiply
>>> these numbers together.  Example:
>>> 
>>> This job requests 160 CPUs and has a time limit of 2-00:00:00
>>> (days-hh:mm:ss) = 2880 min.
>>> 
>>> Did you download the latest versions of my Slurm tools from Github?  I
>>> make improvements of them from time to time.
>>> 
>>> /Ole
>>> 
>>> 
>>>> On 08-05-2020 16:12, Renfro, Michael wrote:
>>>> Thanks, Ole. Your showuserlimits script is actually where I got started
>>>> today, and where I found the sacct command I sent earlier.
>>>> 
>>>> Your script gives the same output for that user: the only line that's
>>>> not a "Limit = None" is for the user's GrpTRESRunMins value, which is
>>>> at "Limit = 1440000, current value = 1402415".
>>>> 
>>>> The limit value is correct, but the current value is not (due to the
>>>> incorrect sacct output).
>>>> 
>>>> I've also gone through sacctmgr show runaway to clean up any runaway
>>>> jobs. I had lots, but they were all from a different user, and had no
>>>> effect on this particular user's values.
>>>> 
>>>> ------------------------------------------------------------------------
>>>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>>>> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
>>>> *Sent:* Friday, May 8, 2020 8:54 AM
>>>> *To:* slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
>>>> *Subject:* Re: [slurm-users] scontrol show assoc_mgr showing more
>>>> resources in use than squeue
>>>> 
>>>> Hi Michael,
>>>> 
>>>> Maybe you will find a couple of my Slurm tools useful for displaying
>>>> data from the Slurm database in a more user-friendly format:
>>>> 
>>>> showjob: Show status of Slurm job(s). Both queue information and
>>>> accounting information is printed.
>>>> 
>>>> showuserlimits: Print Slurm resource user limits and usage
>>>> 
>>>> The user's limits are printed in detail by showuserlimits.
>>>> 
>>>> These tools are available from https://github.com/OleHolmNielsen/Slurm_tools
>>>> 
>>>> /Ole
>>>> 
>>>> On 08-05-2020 15:34, Renfro, Michael wrote:
>>>>> Hey, folks. I've had a 1000 CPU-day (1440000 CPU-minutes) GrpTRESMins
>>>>> limit applied to each user for years. It generally works as intended,
>>>>> but I have one user I've noticed whose usage is highly inflated from
>>>>> reality, causing the GrpTRESMins limit to be enforced much earlier than
>>>>> necessary:
>>>>> 
>>>>> squeue output, showing roughly 340 CPU-days in running jobs, and all
>>>>> other jobs blocked:
>>>>> 
>>>>> # squeue -u USER
>>>>> JOBID  PARTI       NAME     USER ST         TIME CPUS NODES
>>>>> NODELIST(REASON) PRIORITY TRES_P START_TIME           TIME_LEFT
>>>>> 747436 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  10-00:00:00
>>>>> 747437 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  4-04:00:00
>>>>> 747438 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  10-00:00:00
>>>>> 747439 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  4-04:00:00
>>>>> 747440 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  10-00:00:00
>>>>> 747441 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  4-14:00:00
>>>>> 747442 batch        job     USER PD         0:00 28   1
>>>>> (AssocGrpCPURunM 4784     N/A    N/A                  10-00:00:00
>>>>> 747446 batch        job     USER PD         0:00 14   1
>>>>> (AssocGrpCPURunM 4778     N/A    N/A                  4-00:00:00
>>>>> 747447 batch        job     USER PD         0:00 14   1
>>>>> (AssocGrpCPURunM 4778     N/A    N/A                  4-00:00:00
>>>>> 747448 batch        job     USER PD         0:00 14   1
>>>>> (AssocGrpCPURunM 4778     N/A    N/A                  4-00:00:00
>>>>> 747445 batch        job     USER  R      8:39:17 14   1     node002
>>>>>       4778     N/A    2020-05-07T23:02:19  3-15:20:43
>>>>> 747444 batch        job     USER  R     16:03:13 14   1     node003
>>>>>       4515     N/A    2020-05-07T15:38:23  3-07:56:47
>>>>> 747435 batch        job     USER  R   1-10:07:42 28   1     node005
>>>>>       3784     N/A    2020-05-06T21:33:54  8-13:52:18
>>>>> 
>>>>> scontrol output, showing roughly 980 CPU-days in use on the second line,
>>>>> and thus blocking additional jobs:
>>>>> 
>>>>> # scontrol -o show assoc_mgr users=USER account=ACCOUNT flags=assoc
>>>>> ClusterName=its Account=ACCOUNT UserName= Partition= Priority=0 ID=21
>>>>> SharesRaw/Norm/Level/Factor=1/0.03/35/0.00
>>>>> UsageRaw/Norm/Efctv=2733615872.34/0.39/0.71 ParentAccount=PARENT(9)
>>>>> Lft=1197 DefAssoc=No GrpJobs=N(4) GrpJobsAccrue=N(10)
>>>>> GrpSubmitJobs=N(14) GrpWall=N(616142.94)
>>>>> GrpTRES=cpu=N(84),mem=N(168000),energy=N(0),node=N(40),billing=N(420),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0)
>>>>> GrpTRESMins=cpu=N(9239391),mem=N(18478778157),energy=N(0),node=N(616142),billing=N(45546470),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0)
>>>>> GrpTRESRunMins=cpu=N(1890060),mem=N(3780121866),energy=N(0),node=N(113778),billing=N(9450304),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0)
>>>>> MaxJobs= MaxJobsAccrue= MaxSubmitJobs= MaxWallPJ= MaxTRESPJ= MaxTRESPN=
>>>>> MaxTRESMinsPJ= MinPrioThresh=
>>>>> ClusterName=its Account=ACCOUNT UserName=USER(UID) Partition= Priority=0
>>>>> ID=56 SharesRaw/Norm/Level/Factor=1/0.08/13/0.00
>>>>> UsageRaw/Norm/Efctv=994969457.37/0.14/0.36 ParentAccount= Lft=1218
>>>>> DefAssoc=Yes GrpJobs=N(3) GrpJobsAccrue=N(10) GrpSubmitJobs=N(13)
>>>>> GrpWall=N(227625.69)
>>>>> GrpTRES=cpu=N(56),mem=N(112000),energy=N(0),node=N(35),billing=N(280),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=8(0)
>>>>> GrpTRESMins=cpu=N(3346095),mem=N(6692190572),energy=N(0),node=N(227625),billing=N(16580497),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0)
>>>>> GrpTRESRunMins=cpu=1440000(1407455),mem=N(2814910466),energy=N(0),node=N(88171),billing=N(7037276),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0)
>>>>> MaxJobs= MaxJobsAccrue= MaxSubmitJobs= MaxWallPJ= MaxTRESPJ= MaxTRESPN=
>>>>> MaxTRESMinsPJ= MinPrioThresh=
>>>>> 
>>>>> Where can I investigate to find the cause of this difference? Thanks.
>> 


More information about the slurm-users mailing list