[slurm-users] Jobs waiting while plenty of cpu and memory available

Tue Jul 9 16:07:26 UTC 2019

> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of
> Thomas M. Payerle
> Sent: Tuesday, July 9, 2019 10:23 AM
> 
> Do you have backfill enabled?  This can help in many cases.

Yup - I checked for backfill yesterday. It's backfill.

> If the job with highest priority is quite wide, Slurm will reserve resources for
> it.  E.g., if it requests all of your nodes, then Slurm will reserve all nodes as
> they become idle for the wide job, until no other jobs are running and it can
> finally run. 

Yes! That's totally it. Thanks...

That's helpful advice. I have this per Ole's suggestion:
export SQUEUE_FORMAT="%.18i %.9P %.6q %.8j %.8u %.8a %.10T %.9Q %.10M %.10V %.9l %.6D %.6C %m %R"

Then when I run this, it displays the priority (and a bunch of other fields) and sorts by priority, highest at bottom.
squeue -p batch -S Q
I can see a long list of jobs, some running, some pending (priority), and then one big job at the bottom pending (resources). 

So here's something funny. One user submitted a job that requested 60 cpu's and 400000M of memory. Our largest nodes in that partition have 72 cpu's and 256G of memory. So when a user requests 400G of ram, what would be good behavior? I would like to see slurm reject the job, "job is impossible to run." Instead, slurm keeps slowly growing the priority of that job (because of fairshare) and the job effectively disables the nodes that are trying to free up memory for it. (All the nodes that have enough cpu's). Not just one node that has enough cpu's. This is a combination of *multiple* bad behaviors. Stopping a node that can never satisfy the request is bad... Stopping *all* nodes that have enough cpu's, even though none of them can ever satisfy the request is extra bad.

In any event, the problem is solved now. We have an old version of slurm - 17.02.11 - maybe it's a bug that was fixed in a later version. Maybe I can also prevent a reoccurrence in configuration. I'll look and see if I can configure slurm to reject all jobs to partitions that request memory higher than the highest system in the partition. Still, I think that should definitely count as a slurm bug. Very bad behavior.

Thanks!