[slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3
Miguel Gila
miguel.gila at cscs.ch
Tue Feb 27 08:13:41 MST 2018
Microcode patches were not applied to the physical system, only the kernel was upgraded, so I'm not sure whether the performance hit could come from that or not.
<rant>Reducing the size of the DB to make the upgrade process complete in a reasonable time is like shooting a mosquito with a shotgun. Yeah, it works, but there must be better/easier/smarter ways of doing it. IMHO the DB upgrade process should be cleaner/safer.</rant>
For now we're staying with our VM and will see how it behaves over time.
M.
On 26.02.18, 19:30, "slurm-users on behalf of Christopher Benjamin Coffey" <slurm-users-bounces at lists.schedmd.com on behalf of Chris.Coffey at nau.edu> wrote:
Good thought Chris. Yet in our case our system does not have the spectre/meltdown kernel fix.
Just to update everyone, we performed the upgrade successfully after we purged more data jobs/steps first. We did the following to ensure the purge happened right away per Hendryk's recommendation:
ArchiveDir=/common/adm/slurmdb_archive
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveSteps=no
ArchiveResvs=no
ArchiveSuspend=no
PurgeEventAfter=1month
PurgeJobAfter=2880hours # <- changing from 18months
PurgeResvAfter=2month
PurgeStepAfter=2880hours # <- changing from 18 months
PurgeSuspendAfter=2month
Specifying hours so that the purge kicked in quickly.
After having only 4 months of jobs/steps, the update took ~1.5 hrs. This was for a 334MB db with 1.1M jobs.
We've also made this change to stop tracking energy and other things we don't care about right now:
AccountingStorageTRES=cpu,mem
Hope that will help for the future.
Thanks Hendryk, and Paul :)
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 2/23/18, 3:13 PM, "slurm-users on behalf of Chris Samuel" <slurm-users-bounces at lists.schedmd.com on behalf of chris at csamuel.org> wrote:
On Friday, 23 February 2018 8:04:50 PM AEDT Miguel Gila wrote:
> Interestingly enough, a poor vmare VM (2CPUs, 3GB/RAM) with MariaDB 5.5.56
> outperformed our central MySQL 5.5.59 (128GB, 14core, SAN) by a factor of
> at least 3 on every table conversion.
Wild idea completely out of left field..
Does the production system have the updates for Meltdown and Spectre applied,
whereas the VM setup does not?
There are meant to be large impacts from those fixes for syscall heavy
applications and databases are one of those nightmare cases...
cheers,
Chris
--
Chris Samuel : https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7C392a2dc2bbde477e503208d57b0aa473%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636550207997139489&sdata=EdGJVxEIu5K%2Bi2yE2pxKFx7t%2BWmiwNtr6ufchjeHzPc%3D&reserved=0 : Melbourne, VIC
More information about the slurm-users
mailing list