[slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Thu Sep 27 09:12:58 MDT 2018


Hi David,

I'd recommend the following that I've learned from bad experiences upgrading between the last major version.
	
1. Consider upgrading to mysql-server 5.5 or greater

2. Purge/archive unneeded jobs/steps before the upgrade, to make the upgrade as quick as possible:

slurmdbd.conf:

ArchiveDir=/common/adm/slurmdb_archive
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveSteps=no
ArchiveResvs=no
ArchiveSuspend=no
PurgeEventAfter=1month
PurgeJobAfter=6months
PurgeResvAfter=2month
PurgeStepAfter=6months
PurgeSuspendAfter=2month


3. Take a fresh mysql dump after the archives occur:

mysqldump --all-databases > slurm_db.sql


4. Testing the update on another machine, or vm that has a representation of your environment (same rpms, configs, etc). Just take your newly created dump from production and load it into the test system:

mysql -u root < slurm_db.db


Once you take care of any connection issues in mysql, allowing a different host to connect, then you can fire up slumdbd to perform the upgrade. And see how long it takes, and what hiccups you will run into. Now you know, and can plan your maintenance window accordingly.

Hope that helps! Good luck!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

On 9/26/18, 8:57 AM, "slurm-users on behalf of Baker D.J." <slurm-users-bounces at lists.schedmd.com on behalf of D.J.Baker at soton.ac.uk> wrote:

    Thank you for your reply. You're correct, the systemd commands aren't invoked, however upgrading the slurm rpm effectively pulls the rug from under /usr/sbin/slurmctld. The v17.02 slurm rpm provides /usr/sbin/slurmctld,
     but from v17.11 that executable is provided by the slurm-slurmctld rpm. 
    
    
    In other words, doing a minimal install of just the slurm and the slurmdbd rpms deletes the slurmctld executable. I haven't explicitly tested this, however I tested the upgrade on a compute node and experimented with
     the slurmd -- the logic should be the same. 
    
    
    I guess that the question that comes to mind is.. Is it a really big deal if the slurmctld process is down whilst the slurmdbd is being upgraded? Bearing in mind that I will probably opt to suspend all run jobs and stop
     the partitions during the upgrade.
    
    
    Best regards,
    David
    
    ________________________________________
    From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Chris Samuel <chris at csamuel.org>
    Sent: 26 September 2018 11:26
    To: slurm-users at lists.schedmd.com
    Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08 
    
    On Tuesday, 25 September 2018 11:54:31 PM AEST Baker D. J.  wrote:
    
    > That will certainly work, however the slurmctld (or in the case of my test
    > node, the slurmd) will be killed. The logic is that at v17.02 the slurm rpm
    > provides slurmctld and slurmd. So upgrading that rpm will destroy/kill the
    > existing slurmctld or slurmd processes.
    
    If you do that with the --noscripts then will it really kill the process?  
    Nothing should invoke the systemd commands with that, should it?  Or do you 
    mean taking the libraries, etc, away out underneath of the running process 
    will cause it to crash?
    
    Might be worth testing that on on a VM to see if it will happen.
    
    Best of luck!
    Chris
    -- 
     Chris Samuel  :  
    https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C8b7cb9ecbbfe4644d3fa08d6239b7821%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=hdM3hZuFetDEqdCYj4VCrgCZ8hOC2FGsBuS8Ql74Ly0%3D&reserved=0 <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7Ccc8f355d4d974c92165108d623c8c787%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636735742585380968&sdata=b4CRx9DRwkCb8BwJtXMU7eqcYeW6CVasvO1C25Y3X%2FA%3D&reserved=0> 
     :  Melbourne, VIC
    
    
    
    
    
    
    
    
    



More information about the slurm-users mailing list