[slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3
Lech Nieroda
lech.nieroda at uni-koeln.de
Thu Apr 4 11:07:01 UTC 2019
That’s correct but let’s keep in mind that it only concerns the upgrade process and not production runtime which has certain implications.
The affected database structures have been introduced in 17.11 and an upgrade affects only versions 17.02 or prior, it wouldn’t be a problem for users who have made a fresh install of 17.11 or newer.
Furthermore, upgrades shouldn’t skip more than one release, as that would lead to loss of state files and other important information, so users probably won’t upgrade from 17.02 to 19.05 directly. If they’d do that then yes, the patch would be applicable for 19.x, it’s just less likely to occur.
It’s most needed in 17.11 and 18.08 but won’t be included due to late stage in their lifecycle. An understandable decision.
To sum it up, the issue affects those users who still have 17.02 or prior versions, use their distribution defaults for mysql/mariadb from RHEL6/CentOS6 and RHEL7/CentOS7, have millions of jobs in their database *and* would like to upgrade slurm without upgrading mysql.
Those few unfortunate souls will perhaps find this thread and use the patch ;-)
Kind regards,
Lech
> Am 03.04.2019 um 15:33 schrieb Prentice Bisbal <pbisbal at pppl.gov>:
>
>> the dev stated that they’d rather keep that warning than fixing the issue, so I’m not sure if that’ll be enough to convince them.
> Anyone else as disappointed by this as I am? I get that it's too late to add something like this to 17.11 or 18.08, but it seems like SchedMD isn't even interested in looking at this for 19.x If a Linux distro comes with a particular version of a MySQL or MariaDB, and SchedMD says they support that version of the distro, then they should support the version of the DB that comes with that distro. From what I gather from these discussion so far, SchedMD is basically saying we support Linux distro X, but not the MySQL/MariaDB version that comes with that distro. Is that a correct reading of this situation?
>
> --
> Prentice
>
> On 4/3/19 8:04 AM, Lech Nieroda wrote:
>> Hi Ole,
>>
>> since we aren’t using RHEL7/CentOS7 we haven’t tested it with mysql 5.5 and it’d probably carry more weight if someone running that OS would test it and add an appropriate comment. You are welcome to try it out.
>> That being said, the release notes explicitly mention that versions 5.1 and 5.5 should be avoided for the conversion process (probably due to this bug as I haven’t encountered any other issues) and the dev stated that they’d rather keep that warning than fixing the issue, so I’m not sure if that’ll be enough to convince them.
>>
>> Kind regards,
>> Lech
>>
>>> Am 03.04.2019 um 13:28 schrieb Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>:
>>>
>>> Hi Lech,
>>>
>>> Maybe you could add your arguments to the bug report https://bugs.schedmd.com/show_bug.cgi?id=6796 hoping that SchedMD may be convinced that this is a useful patch for future versions of Slurm, also for MySQL/MariaDB versions 5.5 and newer.
>>>
>>> Best regards,
>>> Ole
>>>
>>>
>>> On 4/3/19 1:17 PM, Lech Nieroda wrote:
>>>> Hi Ole,
>>>>> Am 03.04.2019 um 12:53 schrieb Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>:
>>>>> SchedMD already decided that they won't fix the problem:
>>>> Yes, I guess it’s a bit late in the release lifecycles. Nevertheless it’s a pity, as there are certainly a lot of users around who’d rather not upgrade their distribution default mysql-servers just for the sake of a conversion.
>>>>> Can you confirm that your patch is only relevant for an old MySQL 5.1?
>>>>>
>>>>> On our CentOS 7 systems we run the OS's MariaDB server 5.5. Would MySQL/MariaDB version 5.5 be affected by your patch or not?
>>>> The patch will work with any mysql version >= 5.1, since all it does is simplify the query by changing an implicit derived table to an explicit temporary table.
>>>> This way the query complexity is reduced and its execution order doesn’t depend on the „intelligence“ of the mysql optimizer while presenting exactly the same end results.
>>>> We haven’t tested mysql 5.5 whether its optimizer chooses the right execution plan with this query.
>>>> As I’ve said, it took roughly 17 minutes with 11 million jobs, 18 million steps and a innodb buffer pool size of 8G.
>>>> If the table conversion takes more than half an hour and you don’t have tens of millions of jobs then the optimizer has a problem and the patch would help you.
>>>> Kind regards,
>>>> Lech
>>>>> Best regards,
>>>>> Ole
>>>>>
>>>>> On 4/3/19 12:30 PM, Lech Nieroda wrote:
>>>>>> Hello Chris,
>>>>>> I’ve submitted the bug report together with a patch.
>>>>>> We don’t have a support contract but I suppose they’ll at least read it ;)
>>>>>> The code is identical for 18.08.x and 19.05.x, it’s just a different offset.
>>>>>> Kind regards,
>>>>>> Lech
>>>>>>> Am 02.04.2019 um 15:18 schrieb Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>:
>>>>>>>
>>>>>>> Hi Lech,
>>>>>>>
>>>>>>> IMHO, the Slurm user community would benefit the most from your interesting work on MySQL/MariaDB performance, if https://bugs.schedmd.com/show_bug.cgi?id=6796your patch could be made against the current 18.08 and the coming 19.05 releases. This would ensure that your work is carried forward.
>>>>>>>
>>>>>>> Would you be able to make patches against 18.08 and 19.05? If you submit the patches to SchedMD, my guess is that they'd be very interested. A site with a SchedMD support contract (such as our site) could also submit a bug report including your patch.
>>>>>>>
>>>>>>> /Ole
>>>>>>>
>>>>>>> On 4/2/19 2:56 PM, Lech Nieroda wrote:
>>>>>>>> That’s probably it.
>>>>>>>> Sub-queries are known for potential performance issues, so one wonders why the devs didn’t extract it accordingly and made the code more robust or at least compatible with RHEL/CentOS 6 rather than including that remark in the release notes.
>>>>>>>>> Am 02.04.2019 um 07:20 schrieb Chris Samuel <chris at csamuel.org>:
>>>>>>>>>
>>>>>>>>> On Monday, 1 April 2019 7:55:09 AM PDT Lech Nieroda wrote:
>>>>>>>>>
>>>>>>>>>> Further analysis of the query has shown that the mysql optimizer has choosen
>>>>>>>>>> the wrong execution plan. This may depend on the mysql version, ours was
>>>>>>>>>> 5.1.69.
>>>>>>>>> I suspect this is the issue documented in the release notes for 17.11:
>>>>>>>>>
>>>>>>>>> https://github.com/SchedMD/slurm/blob/slurm-17.11/RELEASE_NOTES
>>>>>>>>>
>>>>>>>>> NOTE FOR THOSE UPGRADING SLURMDBD: The database conversion process from
>>>>>>>>> SlurmDBD 16.05 or 17.02 may not work properly with MySQL 5.1 (as was the
>>>>>>>>> default version for RHEL 6). Upgrading to a newer version of MariaDB or
>>>>>>>>> MySQL is strongly encouraged to prevent this problem.
>>>
>>
>
More information about the slurm-users
mailing list