[slurm-users] Fwd: Getting information about AssocGrpCPUMinutesLimit for a job
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Sun Aug 11 14:17:31 UTC 2019
Andreas made a good suggestion of looking at the user's TRESRunMin from
sshare in order to answer Jeff's question about AssocGrpCPUMinutesLimit
for a job. However, getting at this information is in practice really
complicated, and I don't think any ordinary user will bother to look it up.
Due to this complexity, I have added some new functionality to my
"showjob" script available from
The "showjob" tool now tries to extract the information by combining the
sshare, squeue, and sacctmgr commands. The job reasons
AssocGrpCPUMinutesLimit as well as AssocGrpCpuLimit are treated.
An example output for a job is:
$ showjob 1347368
Job 1347368 of user xxx in account yyy has a jobstate=PENDING with
Information about GrpCpuLimit:
User GrpTRES limit is: cpu=1600
Current user TRES is: cpu=1360
This job requires TRES: cpu=960
I think some end users might find this information useful.
Could I ask any interested sites to test the "showjob" tool to see if
the logic works also in their environment? Please send me feedback so
that I may possibly improve the tool.
On 09-08-2019 08:00, Henkel, Andreas wrote:
>> Users may call sshare -l and have a look at the TRESRunMin. There the
>> number of TRES-minutes allocated by jobs currently running against
>> the account is listed. With a little math (cpu*timelimit) about the job
>> in question the users should be able to figure this out. At least they
>> wouldn't need the debug level increased ot a log file.
>> On 8/7/19 8:47 PM, Sarlo, Jeffrey S wrote:
>>> We had a job queued waiting for resources and when we changed the
>>> debug level, we were able to get the following in the slurmctld.log file.
>>> [2019-08-02T10:03:47.347] debug2: JobId=804633 being held, the job is
>>> at or exceeds assoc 50(jeff/(null)/(null)) group max tres(cpu)
>>> minutes of 30000000 of which 1436396 are still available but request
>>> is for 1440000 (plus 0 already in use) tres minutes (request tres
>>> count 80)
>>> We were then able to see that we just needed to lower the timelimit
>>> for the job a little.
>>> Is there a way a user can get this same type of information for a
>>> job, without having to change the slurm debug level and then looking
>>> in a log file?
More information about the slurm-users