[slurm-users] Scheduler does not reserve resources

Jérémy Lapierre jeremy.lapierre at uni-saarland.de
Mon Jan 17 17:00:06 UTC 2022



Hi Rodrigo and Rémi,

> I had a similar behavior a long time ago, and I decided to set 
> SchedulerType=sched/builtin to empty X
> nodes of jobs and execute that high-priority job requesting more than 
> one node. It is not ideal, but the
> cluster has low load, so a user that requests more than one node 
> doesn't delay too much the execution
> of other's jobs.

I don't think this would be ideal in our case as we have heavy loads. 
Also I'm not sure if you mean that we should switch to 
SchedulerType=sched/builtin permanently or just the time needed for the 
jobs causing problem to be allocated ? Also we have some other 
experiences on another cluster and slurm should normally reserve 
resources we think.

> Backfilling doesn't delay the scheduled start time of higher priority 
> jobs,
> but at least they must have a scheduled start time.
> 
> Did you check the start time of your job pending with Resources reason? 
> eg.
> with `scontrol show job <id> | grep StartTime`.

Yes, the scheduled start time have been checked as well, and this time 
is updated through time such that jobs asking for 1/4 of a node can run 
on a freshly-free-1/4th-node. This is why I'm saying that the jobs 
asking for several nodes (tested with 2 nodes here) are pending forever. 
It is like slurm never wants to have unused resources (which also makes 
sense, but how can we satisfy "heavy" resources request then ?). On 
another cluster using slurm, I know that slurm reserves nodes and the 
node state of those reserved nodes becomes "PLANNED" (or plnd), this way 
jobs requesting for more resources than available at the time of 
submission can later be satisfied. This never happens on the cluster 
which is causing issues.

> Sometimes Slurm is unable to define the start time of a pending job. 
> One
> typical reason is the absence of timelimit on the running jobs.
> In t his case Slurm is unable to define when the running jobs are over,
> when the next highest priority job can start and eventually unable to 
> define
> if lower priority jobs actually delay higher priority jobs.

Yes we always set up the time limit of our jobs to the max time limit 
allowed by the partition.

Thanks for your help,

Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220117/bdb68cef/attachment.htm>


More information about the slurm-users mailing list