[slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Wed Feb 21 17:30:05 MST 2018


This is great to know Kurt. We can't be the only folks running into this.. I wonder if the mysql update code gets into a deadlock or something. I'm hoping a slurm dev will chime in ...

Kurt, out of band if need be, I'd be interested in the details of what you ended up doing.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

On 2/21/18, 5:08 PM, "slurm-users on behalf of Kurt H Maier" <slurm-users-bounces at lists.schedmd.com on behalf of khm at sciops.net> wrote:

    On Wed, Feb 21, 2018 at 11:56:38PM +0000, Christopher Benjamin Coffey wrote:
    > Hello,
    > 
    > We have been trying to upgrade slurm on our cluster from 16.05.6 to 17.11.3. I'm thinking this should be doable? Past upgrades have been a breeze, and I believe during the last one, the db upgrade took like 25 minutes. Well now, the db upgrade process is taking far too long. We previously attempted the upgrade during a maintenance window and the upgrade process did not complete after 24 hrs. I gave up on the upgrade and reverted the slurm version back by restoring a backup db.
    
    We hit this on our try as well: upgrading from 17.02.9 to 17.11.3.  We 
    truncated our job history for the upgrade, and then did the rest of the 
    conversion out-of-band and re-imported it after the fact.  It took us 
    almost sixteen hours to convert a 1.5 million-job store.
    
    We got hung up on precisely the same query you did, on a similarly hefty
    machine.  It caused us to roll back an upgrade and try again during our
    subsequent maintenance window with the above approach.
    
    khm
    
    



More information about the slurm-users mailing list