[slurm-users] enabling job script archival
Paul Edmon
pedmon at cfa.harvard.edu
Thu Sep 28 17:59:49 UTC 2023
No, all the archiving does is remove the pointer. What slurm does right
now is that it creates a hash of the job_script/job_env and then checks
and sees if that hash matches one on record. If not then it adds it to
the record, if it does match then it adds a pointer to the appropriate
record. So you can think of the job_script/job_env as an internal
database of all the various scripts and envs that slurm has ever seen
and then what ends up in the Job record is a pointer to that database.
This way slurm can deduplicate scripts/envs that are the same. This
works great for job_scripts as they are functionally the same and thus
you have many jobs pointed to the same script, but less so for job_envs.
-Paul Edmon-
On 9/28/2023 1:55 PM, Ryan Novosielski wrote:
> Thank you; we’ll put in a feature request for improvements in that
> area, and also thanks for the warning? I thought of that in passing,
> but the real world experience is really useful. I could easily see
> wanting that stuff to be retained less often than the main records,
> which is what I’d ask for.
>
> I assume that archiving, in general, would also remove this stuff,
> since old jobs themselves will be removed?
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
> RBHS Campus
> || \\ of NJ | Office of Advanced Research Computing - MSB
> A555B, Newark
> `'
>
>> On Sep 28, 2023, at 13:48, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>>
>> Slurm should take care of it when you add it.
>>
>> So far as horror stories, under previous versions our database size
>> ballooned to be so massive that it actually prevented us from
>> upgrading and we had to drop the columns containing the job_script
>> and job_env. This was back before slurm started hashing the scripts
>> so that it would only store one copy of duplicate scripts. After
>> this point we found that the job_script database stayed at a fairly
>> reasonable size as most users use functionally the same script each
>> time. However the job_env continued to grow like crazy as there are
>> variables in our environment that change fairly consistently
>> depending on where the user is. Thus job_envs ended up being too
>> massive to keep around and so we had to drop them. Frankly we never
>> really used them for debugging. The job_scripts though are super
>> useful and not that much overhead.
>>
>> In summary my recommendation is to only store job_scripts. job_envs
>> add too much storage for little gain, unless your job_envs are
>> basically the same for each user in each location.
>>
>> Also it should be noted that there is no way to prune out job_scripts
>> or job_envs right now. So the only way to get rid of them if they get
>> large is to 0 out the column in the table. You can ask SchedMD for
>> the mysql command to do this as we had to do it here to our job_envs.
>>
>> -Paul Edmon-
>>
>> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>> In my current slurm installation, (recently upgraded to slurm
>>> v23.02.3), I only have
>>>
>>> AccountingStoreFlags=job_comment
>>>
>>> I now intend to add both
>>>
>>> AccountingStoreFlags=job_script
>>> AccountingStoreFlags=job_env
>>>
>>> leaving the default 4MB value for max_script_size
>>>
>>> Do I need to do anything on the DB myself, or will slurm take care
>>> of the additional tables if needed?
>>>
>>> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I
>>> know about the additional diskspace and potentially load needed, and
>>> with our resources and typical workload I should be okay with that.
>>>
>>> Thanks!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230928/9cf79ad0/attachment-0001.htm>
More information about the slurm-users
mailing list