Hi all.

We want to achieve a simple thing with slurm: launch "normal" jobs, and be
able to launch "high priority" jobs that run as soon as possible. End of
it. However we cannot achieve this in a reliable way, meaning that our
current config sometimes works, sometimes not, and this is driving us crazy.

When it works, this is what happens:
- we have, let's say, 10 jobs running with normal priority (--qos=normal,
having final Priority=1001) and few thousands in PENDING state
- we submit a new job with high priority (--qos=high, having final
- at this point, slurm waits until the normal priority job will end to free
up required resources, and then starts a new High priority job. That's

However, from time to time, randomly, this does not happen. Here is an

# the node has around 200GB of memory and 24 CPUs
Partition=t1 State=PD Priority=1001001 Nice=0 ID=337455 CPU=24 Memory=80G
Nice=0 Started=0:00 User=u1 Submitted=2020-07-07T07:16:47
Partition=t1 State=R Priority=1001 Nice=0 ID=337475 CPU=1 Memory=1024M
Nice=0 Started=1:22 User=u1 Submitted=2020-07-07T10:31:46
Partition=t1 State=R Priority=1001 Nice=0 ID=334355 CPU=1 Memory=1024M
Nice=0 Started=58:09 User=u1 Submitted=2020-06-23T09:57:11
Partition=t1 State=R Priority=1001 Nice=0 ID=334354 CPU=1 Memory=1024M
Nice=0 Started=6:29:59 User=u1 Submitted=2020-06-23T09:57:11
Partition=t1 State=R Priority=1001 Nice=0 ID=334353 CPU=1 Memory=1024M
Nice=0 Started=13:25:55 User=u1 Submitted=2020-06-23T09:57:11

You see? Slurm keep starting jobs that have a lower priority. Why is that?

Some info about our config: Slurm is version 16.05. Here is the priority
config of slurm:

##### file /etc/slurm-llnl/slurm.conf

##### command "sacctmgr show qos"
      Name   Priority  MaxSubmitPA
    normal          0  30
      high       1000

Any idea?

