[slurm-users] Very large job getting starved out

David Baker D.J.Baker at soton.ac.uk
Thu Mar 21 15:41:55 UTC 2019


Hi Cyrus,


Thank you for the links. I've taken a good look through the first link (re the cloud cluster) and the only parameter that might be relevant is "assoc_limit_stop", but I'm not sure if that is relevant in this instance. The reason for the delay of the job in question is "priority", however there are quite a lot of jobs from users in the same accounting group with jobs delayed due to "QOSMaxCpuPerUserLimit". They also talk about using the "builtin" scheduler which I guess would turn off backfill.


I have attached a copy of the current slurm.conf so that you and other members can get a better feel for the whole picture. Certainly we see a large number of serial/small (1 node) jobs running through the system and I'm concerned that my setup encourages this behaviour, however how to stem this issue is a mystery to me.


If you or anyone else has any relevant thoughts then please let me know. I particular I am keen to understand "assoc_limit_stop" and whether it is a relevant option in this situation.


Best regards,

David

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Cyrus Proctor <cproctor at tacc.utexas.edu>
Sent: 21 March 2019 14:19
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Very large job getting starved out


Hi David,


You might have a look at the thread "Large job starvation on cloud cluster" that started on Feb 27; there's some good tidbits in there. Off the top without more information, I would venture that settings you have in slurm.conf end up backfilling the smaller jobs at the expense of scheduling the larger jobs.


Your partition configs plus accounting and scheduler configs from slurm.conf would be helpful.


Also, search for "job starvation" here: https://slurm.schedmd.com/sched_config.html<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsched_config.html&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cea23798d0ad54a02f14308d6ae0883d5%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=KfjAqNHQgLcUBBYwZFi8OygU2De%2FdVuTwbdOmUv0Dps%3D&reserved=0> as another potential starting point.


Best,

Cyrus


On 3/21/19 8:55 AM, David Baker wrote:

Hello,


I understand that this is not a straight forward question, however I'm wondering if anyone has any useful ideas, please. Our cluster is busy and the QOS has limited users to a maximum of 32 compute nodes on the "batch" queue. Users are making good of the cluster -- for example one user is running five 6 node jobs at the moment. On the other hand, a job belonging to another user has been stalled in the queue for around 7 days. He has made reasonable use of the cluster and as a result his fairshare component is relatively low. Having said that, the priority of his job is high -- it currently one of the highest priority jobs in the batch partition queue. From sprio...


JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS

359323 batch         180292     100000      79646        547        100          0


I did think that the PriorityDecayHalfLife was quite high at 14 days and so I reduced that to 7 days. For reference I've included the key scheduling settings from the cluster below. Does anyone have any thoughts, please?


Best regards,

David


PriorityDecayHalfLife   = 7-00:00:00
PriorityCalcPeriod      = 00:05:00
PriorityFavorSmall      = No
PriorityFlags           = ACCRUE_ALWAYS,SMALL_RELATIVE_TO_TIME,FAIR_TREE
PriorityMaxAge          = 7-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType            = priority/multifactor
PriorityWeightAge       = 100000
PriorityWeightFairShare = 1000000
PriorityWeightJobSize   = 10000000
PriorityWeightPartition = 1000
PriorityWeightQOS       = 10000



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190321/868e1a3e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm.conf
Type: application/octet-stream
Size: 5910 bytes
Desc: slurm.conf
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190321/868e1a3e/attachment-0001.obj>


More information about the slurm-users mailing list