[slurm-users] Weird issues with slurm's Priority

zaxs84 sciuscianebbia at gmail.com
Tue Jul 7 09:05:03 UTC 2020


Hi all.

We want to achieve a simple thing with slurm: launch "normal" jobs, and be
able to launch "high priority" jobs that run as soon as possible. End of
it. However we cannot achieve this in a reliable way, meaning that our
current config sometimes works, sometimes not, and this is driving us crazy.

When it works, this is what happens:
- we have, let's say, 10 jobs running with normal priority (--qos=normal,
having final Priority=1001) and few thousands in PENDING state
- we submit a new job with high priority (--qos=high, having final
Priority=1001001)
- at this point, slurm waits until the normal priority job will end to free
up required resources, and then starts a new High priority job. That's
Perfect!

However, from time to time, randomly, this does not happen. Here is an
example:

# the node has around 200GB of memory and 24 CPUs
Partition=t1 State=PD Priority=1001001 Nice=0 ID=337455 CPU=24 Memory=80G
Nice=0 Started=0:00 User=u1 Submitted=2020-07-07T07:16:47
Partition=t1 State=R Priority=1001 Nice=0 ID=337475 CPU=1 Memory=1024M
Nice=0 Started=1:22 User=u1 Submitted=2020-07-07T10:31:46
Partition=t1 State=R Priority=1001 Nice=0 ID=334355 CPU=1 Memory=1024M
Nice=0 Started=58:09 User=u1 Submitted=2020-06-23T09:57:11
Partition=t1 State=R Priority=1001 Nice=0 ID=334354 CPU=1 Memory=1024M
Nice=0 Started=6:29:59 User=u1 Submitted=2020-06-23T09:57:11
Partition=t1 State=R Priority=1001 Nice=0 ID=334353 CPU=1 Memory=1024M
Nice=0 Started=13:25:55 User=u1 Submitted=2020-06-23T09:57:11
[...]

You see? Slurm keep starting jobs that have a lower priority. Why is that?

Some info about our config: Slurm is version 16.05. Here is the priority
config of slurm:

##### file /etc/slurm-llnl/slurm.conf
PriorityType=priority/multifactor
PriorityFavorSmall=NO
PriorityWeightQOS=1000000
PriorityWeightFairshare=1000
PriorityWeightPartition=1000
PriorityWeightJobSize=0
PriorityWeightAge=0

##### command "sacctmgr show qos"
      Name   Priority  MaxSubmitPA
    normal          0  30
      high       1000


Any idea?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200707/ab07d7a2/attachment.htm>


More information about the slurm-users mailing list