[slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

Lech Nieroda lech.nieroda at uni-koeln.de
Wed Apr 3 11:17:40 UTC 2019


Hi Ole,

> Am 03.04.2019 um 12:53 schrieb Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>:
> SchedMD already decided that they won't fix the problem:

Yes, I guess it’s a bit late in the release lifecycles. Nevertheless it’s a pity, as there are certainly a lot of users around who’d rather not upgrade their distribution default mysql-servers just for the sake of a conversion.

> Can you confirm that your patch is only relevant for an old MySQL 5.1?
> 
> On our CentOS 7 systems we run the OS's MariaDB server 5.5.  Would MySQL/MariaDB version 5.5 be affected by your patch or not?

The patch will work with any mysql version >= 5.1, since all it does is simplify the query by changing an implicit derived table to an explicit temporary table.
This way the query complexity is reduced and its execution order doesn’t depend on the „intelligence“ of the mysql optimizer while presenting exactly the same end results.

We haven’t tested mysql 5.5 whether its optimizer chooses the right execution plan with this query.
As I’ve said, it took roughly 17 minutes with 11 million jobs, 18 million steps and a innodb buffer pool size of 8G.
If the table conversion takes more than half an hour and you don’t have tens of millions of jobs then the optimizer has a problem and the patch would help you.


Kind regards,
Lech


> 
> Best regards,
> Ole
> 
> On 4/3/19 12:30 PM, Lech Nieroda wrote:
>> Hello Chris,
>> I’ve submitted the bug report together with a patch.
>> We don’t have a  support contract but I suppose they’ll at least read it ;)
>> The code is identical for 18.08.x and 19.05.x, it’s just a different offset.
>> Kind regards,
>> Lech
>>> Am 02.04.2019 um 15:18 schrieb Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>:
>>> 
>>> Hi Lech,
>>> 
>>> IMHO, the Slurm user community would benefit the most from your interesting work on MySQL/MariaDB performance, if https://bugs.schedmd.com/show_bug.cgi?id=6796your patch could be made against the current 18.08 and the coming 19.05 releases.  This would ensure that your work is carried forward.
>>> 
>>> Would you be able to make patches against 18.08 and 19.05?  If you submit the patches to SchedMD, my guess is that they'd be very interested.  A site with a SchedMD support contract (such as our site) could also submit a bug report including your patch.
>>> 
>>> /Ole
>>> 
>>> On 4/2/19 2:56 PM, Lech Nieroda wrote:
>>>> That’s probably it.
>>>> Sub-queries are known for potential performance issues, so one wonders why the devs didn’t extract it accordingly and made the code more robust or at least compatible with RHEL/CentOS 6 rather than including that remark in the release notes.
>>>>> Am 02.04.2019 um 07:20 schrieb Chris Samuel <chris at csamuel.org>:
>>>>> 
>>>>> On Monday, 1 April 2019 7:55:09 AM PDT Lech Nieroda wrote:
>>>>> 
>>>>>> Further analysis of the query has shown that the mysql optimizer has choosen
>>>>>> the wrong execution plan. This may depend on the mysql version, ours was
>>>>>> 5.1.69.
>>>>> 
>>>>> I suspect this is the issue documented in the release notes for 17.11:
>>>>> 
>>>>> https://github.com/SchedMD/slurm/blob/slurm-17.11/RELEASE_NOTES
>>>>> 
>>>>> NOTE FOR THOSE UPGRADING SLURMDBD: The database conversion process from
>>>>>      SlurmDBD 16.05 or 17.02 may not work properly with MySQL 5.1 (as was the
>>>>>      default version for RHEL 6).  Upgrading to a newer version of MariaDB or
>>>>>      MySQL is strongly encouraged to prevent this problem.
> 




More information about the slurm-users mailing list