[slurm-users] enabling job script archival

Thu Sep 28 17:59:49 UTC 2023

No, all the archiving does is remove the pointer.  What slurm does right 
now is that it creates a hash of the job_script/job_env and then checks 
and sees if that hash matches one on record. If not then it adds it to 
the record, if it does match then it adds a pointer to the appropriate 
record.  So you can think of the job_script/job_env as an internal 
database of all the various scripts and envs that slurm has ever seen 
and then what ends up in the Job record is a pointer to that database.  
This way slurm can deduplicate scripts/envs that are the same. This 
works great for job_scripts as they are functionally the same and thus 
you have many jobs pointed to the same script, but less so for job_envs.

-Paul Edmon-

On 9/28/2023 1:55 PM, Ryan Novosielski wrote:
> Thank you; we’ll put in a feature request for improvements in that 
> area, and also thanks for the warning? I thought of that in passing, 
> but the real world experience is really useful. I could easily see 
> wanting that stuff to be retained less often than the main records, 
> which is what I’d ask for.
>
> I assume that archiving, in general, would also remove this stuff, 
> since old jobs themselves will be removed?
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
> RBHS Campus
> ||  \\    of NJ | Office of Advanced Research Computing - MSB 
> A555B, Newark
>      `'
>
>> On Sep 28, 2023, at 13:48, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>>
>> Slurm should take care of it when you add it.
>>
>> So far as horror stories, under previous versions our database size 
>> ballooned to be so massive that it actually prevented us from 
>> upgrading and we had to drop the columns containing the job_script 
>> and job_env.  This was back before slurm started hashing the scripts 
>> so that it would only store one copy of duplicate scripts.  After 
>> this point we found that the job_script database stayed at a fairly 
>> reasonable size as most users use functionally the same script each 
>> time. However the job_env continued to grow like crazy as there are 
>> variables in our environment that change fairly consistently 
>> depending on where the user is. Thus job_envs ended up being too 
>> massive to keep around and so we had to drop them. Frankly we never 
>> really used them for debugging. The job_scripts though are super 
>> useful and not that much overhead.
>>
>> In summary my recommendation is to only store job_scripts. job_envs 
>> add too much storage for little gain, unless your job_envs are 
>> basically the same for each user in each location.
>>
>> Also it should be noted that there is no way to prune out job_scripts 
>> or job_envs right now. So the only way to get rid of them if they get 
>> large is to 0 out the column in the table. You can ask SchedMD for 
>> the mysql command to do this as we had to do it here to our job_envs.
>>
>> -Paul Edmon-
>>
>> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>> In my current slurm installation, (recently upgraded to slurm 
>>> v23.02.3), I only have
>>>
>>> AccountingStoreFlags=job_comment
>>>
>>> I now intend to add both
>>>
>>> AccountingStoreFlags=job_script
>>> AccountingStoreFlags=job_env
>>>
>>> leaving the default 4MB value for max_script_size
>>>
>>> Do I need to do anything on the DB myself, or will slurm take care 
>>> of the additional tables if needed?
>>>
>>> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I 
>>> know about the additional diskspace and potentially load needed, and 
>>> with our resources and typical workload I should be okay with that.
>>>
>>> Thanks!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230928/9cf79ad0/attachment-0001.htm>