[slurm-users] SlurmDB Archive settings?

Paul Edmon pedmon at cfa.harvard.edu
Thu Jul 14 19:00:40 UTC 2022


Yeah, a word of warning about going from 21.08 to 22.05, make sure you 
have enough storage on the database host you are doing the work on and 
budget a long enough time for the upgrade.  We just converted our 198 GB 
(compressed, 534 GB raw) database this week.  The initial attempt failed 
(after running for 8 hours) because we ran out of disk space (part of 
the reason we had to compress is that the server we use for our slurm 
master only has 800 GB of SSD on it).  That meant we had to reimport our 
DB, which took 8 hours, plus then we had to drop the job scripts and job 
envs, which took another 5 hours, to then attempt the upgrade which took 
2 hours.


Moral of the story, make sure you have enough space and budget 
sufficient time.  You may want to consider nulling out the job scripts 
and envs for the upgrade as they complete redo the way those are stored 
in the database in 22.05 so that it is more efficient but getting from 
here to there is the trick.


For details see the bug report we filed: 
https://bugs.schedmd.com/show_bug.cgi?id=14514


-Paul Edmon-


On 7/14/2022 2:34 PM, Timony, Mick wrote:
>
>
>     What I can tell you is that we have never had a problem
>     reimporting the data back in that was dumped from older versions
>     into a current version database.  So the import using sacctmgr
>     must do the conversion from the older formats to the newer formats
>     and handle the schema changes.
>
> ​That's the bit of info I was missing, I didn't realise that it 
> outputs the data in a format that sacctmgr can read.
>
>     I will note that if you are storing job_scripts and envs those can
>     eat up a ton of space in 21.08.  It looks like they've solved that
>     problem in 22.05 but the archive steps on 21.08 took forever due
>     to those scripts and envs.
>
> ​Yes, we are storing job_scripts with:
>
> AccountingStoreFlags=job_script
>
> I think when we made that decision, we decided that also saving 
> the job_env would take up too much room as our DB is pretty big at the 
> moment, at approx. 300GB with the o2_step_table and the o2_job_table 
> taking up the most space for obvious reasons:
>
> +----------------------------+-----------+
> | Table                      | Size (GB) |
> +----------------------------+-----------+
> | o2_step_table              |    183.83 |
> | o2_job_table               |    128.18 |
>
>
> That's good advice Paul, much appreciated.
>
> >took forever and actually caused issues with the archive process
> I think that should be highlighted for other users!
>
> For those interested, to find the tables sizes I did this:
>
> SELECT table_name AS "Table", ROUND(((data_length + index_length) / 
> 1024 / 1024 / 1024), 2) AS "Size (GB)" FROM information_schema.TABLES 
> WHERE table_schema = "slurmdbd" ORDER BY (data_length + index_length) 
> DESC;
>
> Replace slurmdbdwith the name of your database.
>
> Cheers
> --Mick
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220714/35a9739d/attachment.htm>


More information about the slurm-users mailing list