[slurm-users] enabling job script archival
Paul Edmon
pedmon at cfa.harvard.edu
Mon Oct 2 15:07:13 UTC 2023
At least in our setup, users can see their own scripts by doing sacct -B
-j JOBID
I would make sure that the scripts are being stored and how you have
PrivateData set.
-Paul Edmon-
On 10/2/2023 10:57 AM, Davide DelVento wrote:
> I deployed the job_script archival and it is working, however it can
> be queried only by root.
>
> A regular user can run sacct -lj towards any jobs (even those by other
> users, and that's okay in our setup) with no problem. However if they
> run sacct -j job_id --batch-script even against a job they own
> themselves, nothing is returned and I get a
>
> slurmdbd: error: couldn't get information for this user (null)(xxxxxx)
>
> where xxxxx is the posix ID of the user who's running the query in the
> slurmdbd logs.
>
> Both configure files slurmdbd.conf and slurm.conf do not have any
> "permission" setting. FWIW, we use LDAP.
>
> Is that the expected behavior, in that by default only root can see
> the job scripts? I was assuming the users themselves should be able to
> debug their own jobs... Any hint on what could be changed to achieve this?
>
> Thanks!
>
>
>
> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento
> <davide.quantum at gmail.com> wrote:
>
> Fantastic, this is really helpful, thanks!
>
> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon
> <pedmon at cfa.harvard.edu> wrote:
>
> Yes it was later than that. If you are 23.02 you are good.
> We've been running with storing job_scripts on for years at
> this point and that part of the database only uses up 8.4G.
> Our entire database takes up 29G on disk. So its about 1/3 of
> the database. We also have database compression which helps
> with the on disk size. Raw uncompressed our database is about
> 90G. We keep 6 months of data in our active database.
>
> -Paul Edmon-
>
> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>> Sorry for the duplicate e-mail in a short time: do you know
>> (or anyone) when the hashing was added? Was planning to
>> enable this on 21.08, but we then had to delay our upgrade to
>> it. I’m assuming later than that, as I believe that’s when
>> the feature was added.
>>
>>> On Sep 28, 2023, at 13:55, Ryan Novosielski
>>> <novosirj at rutgers.edu> <mailto:novosirj at rutgers.edu> wrote:
>>>
>>> Thank you; we’ll put in a feature request for improvements
>>> in that area, and also thanks for the warning? I thought of
>>> that in passing, but the real world experience is really
>>> useful. I could easily see wanting that stuff to be retained
>>> less often than the main records, which is what I’d ask for.
>>>
>>> I assume that archiving, in general, would also remove this
>>> stuff, since old jobs themselves will be removed?
>>>
>>> --
>>> #BlackLivesMatter
>>> ____
>>> || \\UTGERS,
>>> |---------------------------*O*---------------------------
>>> ||_// the State | Ryan Novosielski -
>>> novosirj at rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922)
>>> ~*~ RBHS Campus
>>> || \\ of NJ | Office of Advanced Research Computing -
>>> MSB A555B, Newark
>>> `'
>>>
>>>> On Sep 28, 2023, at 13:48, Paul Edmon
>>>> <pedmon at cfa.harvard.edu> <mailto:pedmon at cfa.harvard.edu> wrote:
>>>>
>>>> Slurm should take care of it when you add it.
>>>>
>>>> So far as horror stories, under previous versions our
>>>> database size ballooned to be so massive that it actually
>>>> prevented us from upgrading and we had to drop the columns
>>>> containing the job_script and job_env. This was back
>>>> before slurm started hashing the scripts so that it would
>>>> only store one copy of duplicate scripts. After this point
>>>> we found that the job_script database stayed at a fairly
>>>> reasonable size as most users use functionally the same
>>>> script each time. However the job_env continued to grow
>>>> like crazy as there are variables in our environment that
>>>> change fairly consistently depending on where the user is.
>>>> Thus job_envs ended up being too massive to keep around and
>>>> so we had to drop them. Frankly we never really used them
>>>> for debugging. The job_scripts though are super useful and
>>>> not that much overhead.
>>>>
>>>> In summary my recommendation is to only store job_scripts.
>>>> job_envs add too much storage for little gain, unless your
>>>> job_envs are basically the same for each user in each location.
>>>>
>>>> Also it should be noted that there is no way to prune out
>>>> job_scripts or job_envs right now. So the only way to get
>>>> rid of them if they get large is to 0 out the column in the
>>>> table. You can ask SchedMD for the mysql command to do this
>>>> as we had to do it here to our job_envs.
>>>>
>>>> -Paul Edmon-
>>>>
>>>> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>>>> In my current slurm installation, (recently upgraded to
>>>>> slurm v23.02.3), I only have
>>>>>
>>>>> AccountingStoreFlags=job_comment
>>>>>
>>>>> I now intend to add both
>>>>>
>>>>> AccountingStoreFlags=job_script
>>>>> AccountingStoreFlags=job_env
>>>>>
>>>>> leaving the default 4MB value for max_script_size
>>>>>
>>>>> Do I need to do anything on the DB myself, or will slurm
>>>>> take care of the additional tables if needed?
>>>>>
>>>>> Any comments/suggestions/gotcha/pitfalls/horror_stories to
>>>>> share? I know about the additional diskspace and
>>>>> potentially load needed, and with our resources and
>>>>> typical workload I should be okay with that.
>>>>>
>>>>> Thanks!
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231002/c8837975/attachment.htm>
More information about the slurm-users
mailing list