[slurm-users] Questions about default_queue_depth

Renfro, Michael Renfro at tntech.edu
Wed Jan 12 17:43:30 UTC 2022

Not answering every question below, but for (1) we're at 200 on a cluster with a few dozen nodes and around 1k cores, as per https://lists.schedmd.com/pipermail/slurm-users/2021-June/007463.html -- there may be other settings in that email that could be beneficial. We had a lot of idle resources that could have been backfilled with short, lower-priority jobs, and this basically resolved it.

For (3), I think https://slurm.schedmd.com/sprio.html would be my first stop.

For (4), as far as I know, that's a setting for all partitions.

From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of David Henkemeyer <david.henkemeyer at gmail.com>
Date: Wednesday, January 12, 2022 at 11:27 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Questions about default_queue_depth

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.


A few weeks ago, we tested Slurm against about 50K jobs, and observed at least one instance where a node went idle, while there were jobs on the queue that could have run on the idle node.  The best guess as to why this occurred, at this point, is that the default_queue_depth was set to the default value of 100, and that the queued jobs were likely not in the first 100 jobs in the queue.  Based on this, I have a few questions:
1) What is a reasonable value for default_queue_depth?  Would 1000 be ok, in terms of performance?
2) How can we better debug why queued jobs are not being selected?
3) Is there a way to see the order of the jobs in the queue?  Perhaps squeue lists the jobs in order?
3) If we had several partitions, would the default_queue_dpeth apply to all partitions?

Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220112/3cacc05f/attachment.htm>

More information about the slurm-users mailing list