[slurm-users] Useful script: estimating how long until the next blocked job starts

Renfro, Michael Renfro at tntech.edu
Thu Jan 23 22:29:28 UTC 2020


Hey, folks.

Some of my users submit job after job with no recognition of our 1000 CPU-day TRES limit, and thus their later jobs get blocked with the reason AssocGrpCPURunMinutesLimit.

I’ve written up a script [1] using Ole Holm Nielsen’s showuserlimits script [2] that will identify a user’s smallest-resource blocked job, and to predict when that job might run at current resource consumption rates. Non-root users will query about their blocked jobs, and root can query about anyone’s.

Example runs:

=====

# guessblockedjobstart someusername
Next blocked job to run should be 551294, with 188160 CPU-minute(s) requested
- Limit for running and queued jobs is 1440000 CPU-minutes
- Running and pending jobs have 1364937 CPU-minutes remaining
- Leaving 75063 CPU-minutes available currently
- Smallest blocked job, 551294, requested 188160 CPU-minutes
  (14 CPU(s) on 1 node(s) for 13440 minute(s))
- Currently-running jobs release 7560 CPU-minutes per hour of elapsed time
Estimated time for job 551294 to enter queue is Fri Jan 24 07:14 CST 2020,
if resources are available

# guessblockedjobstart anotherusername
User anotherusername has no blocked jobs

=====

Let me know if there any questions or problems found. Thanks.

[1] https://gist.github.com/mikerenfro/4d21fee5cd6c82b16e30c46fb2bf3226
[2] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits

-- 
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601     / Tennessee Tech University



More information about the slurm-users mailing list