Thank you so much for the prompt response! This makes a lot of sense. We hadn't seen this explicitly stated in the docs, but it's what we gleaned from them.

On Tue, May 6, 2025 at 11:14 AM Paul Edmon via slurm-users <slurm-users@lists.schedmd.com> wrote:

Certainly it would help. Setting reasonable defaults for time is good idea just in general. For instance we set 10 minutes as our default and anything longer people have to explicitly request (up to the MaxTime for the partition).

More to the point, what Reason does the scheduler give for what the job is pending? If you do squeue or scontrol show job it should list the reason why its pending. If its Resources, then the scheduler is waiting for sufficient resources to free up to scheduler. If its is Priority then the job is pending due to other jobs ahead of it.

-Paul Edmon-

On 5/6/2025 11:05 AM, Mike via slurm-users wrote:

Greetings,


We are new to Slurm and we are trying to better understand why we’re seeing high-mem jobs stuck in Pending state indefinitely. Smaller (mem) jobs in the queue will continue to pass by the high mem jobs even when we bump priority on a pending high-mem job way up. We have been reading over the backfill scheduling page and what we think we're seeing is that we need to require that users specify a --time parameter on their jobs so that Backfill works properly. None of our users specify a --time param because we have never required it. Is that what we need to require in order to fix this situation? From the backfill page:  "Backfill scheduling is difficult without reasonable time limit estimates for jobs, but some configuration parameters that can help" and it goes on to list some config params that we have not set (DefaultTime, MaxTime, OverTimeLimit). We also see language such as, “Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are important for backfill scheduling to work well.” So we suspect that we can achieve proper backfill scheduling by requiring that all users supply a "--time" parameter via a job submit plugin. Would that be a fair statement?

 

Thank you in advance!

-Mike Schor



    

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com