[slurm-users] Slurmdbd purge settings

Luke Sudbery L.R.Sudbery at bham.ac.uk
Tue Feb 23 14:03:20 UTC 2021


That great, thanks. We were thinking about staging it like that, and using days is simpler to trigger than waiting for the month.

We will also need to increase innodb_lock_wait_timeout first so we don't hit the problems described in https://bugs.schedmd.com/show_bug.cgi?id=4295.

Anyone know why sreport would suddenly so much longer in the first place, though?

Many thanks,

Luke

-- 
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don’t work on Monday.

> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of
> Ole.H.Nielsen at fysik.dtu.dk
> Sent: 23 February 2021 13:24
> To: slurm-users at lists.schedmd.com
> Subject: Re: [slurm-users] Slurmdbd purge settings
> 
> On 2/23/21 1:25 PM, Luke Sudbery wrote:
> > We have suddenly got bad performance from sreport, querying a 1 hour
> > period (in the last 24 hours) for TopUsage went from taking under a
> minute
> > to timing out after the 15 minutes max slurmdbd query time – although
> the
> > SQL query on the DB server continued long after that.
> >
> > So firstly we were wondering what might have caused that.
> >
> > But while investigating we decided we should turn on purging records
> in
> > slurmdbd.conf, and wanted more detail about when the purge would
> occur and
> > would it lock the database for other Slurm processes. Docs say “The
> purge
> > takes place at the start of the each purge interval.” But we assume
> it
> > will also do so on a restart of slurmdbd so we can manage exactly
> when
> > that happens – is that true? And as we have many years and millions
> of
> > records to purge we need to know if this will hang all database
> access,
> > and what kind of outage that is likely to cause.
> >
> > Anyone have experience of enabling urging after the fact?
> 
> I worked on progressive database purging a while back and documented it
> in
> my Slurm Wiki page:
> 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-
> purge-parameters
> 
> Note in particular these recommendations:
> 
> A monthly purge operation can be a huge amount of work for a database
> depending on its size, and you certainly want to cut down the amount of
> work required during the purges. If you did not use purges before, it
> is
> probably a good idea to try out a series of daily purges starting with:
> 
> PurgeEventAfter=2000days
> PurgeJobAfter=2000days
> PurgeResvAfter=2000days
> PurgeStepAfter=2000days
> PurgeSuspendAfter=2000days
> 
> If this works well over a few days, decrease the purge interval
> 2000days
> little by little and try again (1800, 1500, etc) until you after many
> iterations come down to the desired final purge intervals.
> 
> I hope this helps.
> 
> Best regards,
> Ole



More information about the slurm-users mailing list