[slurm-users] Over-riding array limits

Bill Barth bbarth at tacc.utexas.edu
Sat Feb 24 06:55:00 MST 2018

We don’t allow array jobs (we have our own tools for packing small jobs into bigger ones), so I can’t look at this myself, but what does ‘scontrol show job <jobid>’ show for this job? If you can find the ‘4’ in this job as some named parameter, you ought to be able to do an ‘scontrol update job <jobid> ThatParameter=100’ or whatever you like to change it.

Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445

On 2/23/18, 11:13 PM, "slurm-users on behalf of ~Stack~" <slurm-users-bounces at lists.schedmd.com on behalf of i.am.stack at gmail.com> wrote:

    I have a user that submits many many many jobs at once in an array.
    Happily, he's a very nice user and doesn't often cause trouble.
    The documentation for the job array
    (https://slurm.schedmd.com/job_array.html) says:
    "A maximum number of simultaneously running tasks from the job array may
    be specified using a "%" separator. For example "--array=0-15%4" will
    limit the number of simultaneously running tasks from this job array to 4."
    Awesome. That's exactly what he is doing.
    A big job recently just finished and I'm looking at my queue noticing
    that no one bothered to load it to the brim this weekend leaving me with
    several idle compute nodes. Meanwhile, this user has got quite a few
    jobs still waiting to run with a "JobArrayTaskLimit". :-/
    I've been poking at it for the last 20-30 minutes, but I'm not seeing
    how I, with the power of root, can update his own "self imposed" array
    limit. It's late, my attempts have not worked, and my google-fu isn't
    returning any helpful results. I'm not really concerned by it, but I
    would like to know should this happen again.
    How can I increase a JobArrayTaskLimit?
    So using the documentation example, how would I "scontrol update" the
    array to be "--array=0-15%6" when jobs 0-3 are already running?
    Or maybe just say "Grab X number of jobs and run them anyway"?
    So again with the documentation example, maybe 0-3 are done, 4-7 are
    running, and I just want to manually tell 8-10 to run anyway on
    available resources leaving 11-15 under current constraints.
    Thank you!

More information about the slurm-users mailing list