[slurm-users] enabling job script archival
Davide DelVento
davide.quantum at gmail.com
Fri Sep 29 11:48:50 UTC 2023
Fantastic, this is really helpful, thanks!
On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
> Yes it was later than that. If you are 23.02 you are good. We've been
> running with storing job_scripts on for years at this point and that part
> of the database only uses up 8.4G. Our entire database takes up 29G on
> disk. So its about 1/3 of the database. We also have database compression
> which helps with the on disk size. Raw uncompressed our database is about
> 90G. We keep 6 months of data in our active database.
>
> -Paul Edmon-
> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>
> Sorry for the duplicate e-mail in a short time: do you know (or anyone)
> when the hashing was added? Was planning to enable this on 21.08, but we
> then had to delay our upgrade to it. I’m assuming later than that, as I
> believe that’s when the feature was added.
>
> On Sep 28, 2023, at 13:55, Ryan Novosielski <novosirj at rutgers.edu>
> <novosirj at rutgers.edu> wrote:
>
> Thank you; we’ll put in a feature request for improvements in that area,
> and also thanks for the warning? I thought of that in passing, but the real
> world experience is really useful. I could easily see wanting that stuff to
> be retained less often than the main records, which is what I’d ask for.
>
> I assume that archiving, in general, would also remove this stuff, since
> old jobs themselves will be removed?
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> || \\ of NJ | Office of Advanced Research Computing - MSB
> A555B, Newark
> `'
>
> On Sep 28, 2023, at 13:48, Paul Edmon <pedmon at cfa.harvard.edu>
> <pedmon at cfa.harvard.edu> wrote:
>
> Slurm should take care of it when you add it.
>
> So far as horror stories, under previous versions our database size
> ballooned to be so massive that it actually prevented us from upgrading and
> we had to drop the columns containing the job_script and job_env. This was
> back before slurm started hashing the scripts so that it would only store
> one copy of duplicate scripts. After this point we found that the
> job_script database stayed at a fairly reasonable size as most users use
> functionally the same script each time. However the job_env continued to
> grow like crazy as there are variables in our environment that change
> fairly consistently depending on where the user is. Thus job_envs ended up
> being too massive to keep around and so we had to drop them. Frankly we
> never really used them for debugging. The job_scripts though are super
> useful and not that much overhead.
>
> In summary my recommendation is to only store job_scripts. job_envs add
> too much storage for little gain, unless your job_envs are basically the
> same for each user in each location.
>
> Also it should be noted that there is no way to prune out job_scripts or
> job_envs right now. So the only way to get rid of them if they get large is
> to 0 out the column in the table. You can ask SchedMD for the mysql command
> to do this as we had to do it here to our job_envs.
>
> -Paul Edmon-
>
> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>
> In my current slurm installation, (recently upgraded to slurm v23.02.3), I
> only have
>
> AccountingStoreFlags=job_comment
>
> I now intend to add both
>
> AccountingStoreFlags=job_script
> AccountingStoreFlags=job_env
>
> leaving the default 4MB value for max_script_size
>
> Do I need to do anything on the DB myself, or will slurm take care of the
> additional tables if needed?
>
> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know
> about the additional diskspace and potentially load needed, and with our
> resources and typical workload I should be okay with that.
>
> Thanks!
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230929/4a4f61f7/attachment-0001.htm>
More information about the slurm-users
mailing list