[slurm-users] Scheduler does not reserve resources

Wed Jan 19 00:46:10 UTC 2022

Hi Jeremy,

If all jobs have the same time limit, backfill is impossible. The
documentation says: "Effectiveness of backfill scheduling is dependent upon
users specifying job time limits, otherwise all jobs will have the same
time limit and backfilling is impossible". I don't know to overcome that...

However, without changing SchedulerType, you could hold pending jobs except
for the job you want to execute, then release all jobs when the desired job
is allocated. Also, you could define a node or list of nodes available for
all jobs excluding nodes for the job of interest, then remove the
configuration when the latter is allocated. I preferred to do the second
because the "heavy" job and the "light" jobs will be allocated, and I have
not to be aware of the queue outside office hours (Again, easier to do in a
low utilized cluster).

About "PLANNED", I wasn't aware, and it is a feature of SLURM 21.08. Could
be that why you don't see it in your cluster?

Best,

On Mon, Jan 17, 2022 at 2:02 PM Jérémy Lapierre <
jeremy.lapierre at uni-saarland.de> wrote:

> Hi Rodrigo and Rémi,
>
> >I had a similar behavior a long time ago, and I decided to set
> SchedulerType=sched/builtin to empty X
> >nodes of jobs and execute that high-priority job requesting more than one
> node. It is not ideal, but the
> >cluster has low load, so a user that requests more than one node doesn't
> delay too much the execution
> >of other's jobs.
>
> I don't think this would be ideal in our case as we have heavy loads. Also
> I'm not sure if you mean that we should switch to
> SchedulerType=sched/builtin permanently or just the time needed for the
> jobs causing problem to be allocated ? Also we have some other experiences
> on another cluster and slurm should normally reserve resources we think.
>
> >Backfilling doesn't delay the scheduled start time of higher priority
> jobs,
> >but at least they must have a scheduled start time.
> >
> >Did you check the start time of your job pending with Resources reason?
> eg.
> >with `scontrol show job <id> | grep StartTime`.
>
> Yes, the scheduled start time have been checked as well, and this time is
> updated through time such that jobs asking for 1/4 of a node can run on a
> freshly-free-1/4th-node. This is why I'm saying that the jobs asking for
> several nodes (tested with 2 nodes here) are pending forever. It is like
> slurm never wants to have unused resources (which also makes sense, but how
> can we satisfy "heavy" resources request then ?). On another cluster using
> slurm, I know that slurm reserves nodes and the node state of those
> reserved nodes becomes "PLANNED" (or plnd), this way jobs requesting for
> more resources than available at the time of submission can later be
> satisfied. This never happens on the cluster which is causing issues.
>
> >Sometimes Slurm is unable to define the start time of a pending job. One
> >typical reason is the absence of timelimit on the running jobs.
> >In t his case Slurm is unable to define when the running jobs are over,
> >when the next highest priority job can start and eventually unable to
> define
> >if lower priority jobs actually delay higher priority jobs.
>
> Yes we always set up the time limit of our jobs to the max time limit
> allowed by the partition.
>
> Thanks for your help,
>
> Jeremy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220118/f64666c6/attachment.htm>