[slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

Miguel Gila miguel.gila at cscs.ch
Tue Feb 27 08:13:41 MST 2018


Microcode patches were not applied to the physical system, only the kernel was upgraded, so I'm not sure whether the performance hit could come from that or not.

<rant>Reducing the size of the DB to make the upgrade process complete in a reasonable time is like shooting a mosquito with a shotgun. Yeah, it works, but there must be better/easier/smarter ways of doing it. IMHO the DB upgrade process should be cleaner/safer.</rant>

For now we're staying with our VM and will see how it behaves over time.

M.


On 26.02.18, 19:30, "slurm-users on behalf of Christopher Benjamin Coffey" <slurm-users-bounces at lists.schedmd.com on behalf of Chris.Coffey at nau.edu> wrote:

    Good thought Chris. Yet in our case our system does not have the spectre/meltdown kernel fix.
    
    Just to update everyone, we performed the upgrade successfully after we purged more data jobs/steps first. We did the following to ensure the purge happened right away per Hendryk's recommendation:
    
    ArchiveDir=/common/adm/slurmdb_archive
    ArchiveEvents=yes
    ArchiveJobs=yes
    ArchiveSteps=no
    ArchiveResvs=no
    ArchiveSuspend=no
    PurgeEventAfter=1month
    PurgeJobAfter=2880hours	# <- changing from 18months
    PurgeResvAfter=2month
    PurgeStepAfter=2880hours	# <- changing from 18 months
    PurgeSuspendAfter=2month
    
    Specifying hours so that the purge kicked in quickly.
    
    After having only 4 months of jobs/steps, the update took ~1.5 hrs. This was for a 334MB db with 1.1M jobs.
    
    We've also made this change to stop tracking energy and other things we don't care about right now:
    
    AccountingStorageTRES=cpu,mem
    
    Hope that will help for the future.
    
    Thanks Hendryk, and Paul :)
    
    Best,
    Chris
    —
    Christopher Coffey
    High-Performance Computing
    Northern Arizona University
    928-523-1167
     
    On 2/23/18, 3:13 PM, "slurm-users on behalf of Chris Samuel" <slurm-users-bounces at lists.schedmd.com on behalf of chris at csamuel.org> wrote:
    
        On Friday, 23 February 2018 8:04:50 PM AEDT Miguel Gila wrote:
        
        > Interestingly enough, a poor vmare VM (2CPUs, 3GB/RAM) with MariaDB 5.5.56
        > outperformed our central MySQL 5.5.59 (128GB, 14core, SAN) by a factor of
        > at least 3 on every table conversion.
        
        Wild idea completely out of left field..
        
        Does the production system have the updates for Meltdown and Spectre applied, 
        whereas the VM setup does not?
        
        There are meant to be large impacts from those fixes for syscall heavy 
        applications and databases are one of those nightmare cases...
        
        cheers,
        Chris
        -- 
         Chris Samuel  :  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7C392a2dc2bbde477e503208d57b0aa473%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636550207997139489&sdata=EdGJVxEIu5K%2Bi2yE2pxKFx7t%2BWmiwNtr6ufchjeHzPc%3D&reserved=0  :  Melbourne, VIC
        
        
        
    
    



More information about the slurm-users mailing list