[slurm-users] enabling job script archival

Mon Oct 2 15:07:13 UTC 2023

At least in our setup, users can see their own scripts by doing sacct -B 
-j JOBID

I would make sure that the scripts are being stored and how you have 
PrivateData set.

-Paul Edmon-

On 10/2/2023 10:57 AM, Davide DelVento wrote:
> I deployed the job_script archival and it is working, however it can 
> be queried only by root.
>
> A regular user can run sacct -lj towards any jobs (even those by other 
> users, and that's okay in our setup) with no problem. However if they 
> run sacct -j job_id --batch-script even against a job they own 
> themselves, nothing is returned and I get a
>
> slurmdbd: error: couldn't get information for this user (null)(xxxxxx)
>
> where xxxxx is the posix ID of the user who's running the query in the 
> slurmdbd logs.
>
> Both configure files slurmdbd.conf and slurm.conf do not have any 
> "permission" setting. FWIW, we use LDAP.
>
> Is that the expected behavior, in that by default only root can see 
> the job scripts? I was assuming the users themselves should be able to 
> debug their own jobs... Any hint on what could be changed to achieve this?
>
> Thanks!
>
>
>
> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento 
> <davide.quantum at gmail.com> wrote:
>
>     Fantastic, this is really helpful, thanks!
>
>     On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon
>     <pedmon at cfa.harvard.edu> wrote:
>
>         Yes it was later than that. If you are 23.02 you are good. 
>         We've been running with storing job_scripts on for years at
>         this point and that part of the database only uses up 8.4G. 
>         Our entire database takes up 29G on disk. So its about 1/3 of
>         the database.  We also have database compression which helps
>         with the on disk size. Raw uncompressed our database is about
>         90G.  We keep 6 months of data in our active database.
>
>         -Paul Edmon-
>
>         On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>>         Sorry for the duplicate e-mail in a short time: do you know
>>         (or anyone) when the hashing was added? Was planning to
>>         enable this on 21.08, but we then had to delay our upgrade to
>>         it. I’m assuming later than that, as I believe that’s when
>>         the feature was added.
>>
>>>         On Sep 28, 2023, at 13:55, Ryan Novosielski
>>>         <novosirj at rutgers.edu> <mailto:novosirj at rutgers.edu> wrote:
>>>
>>>         Thank you; we’ll put in a feature request for improvements
>>>         in that area, and also thanks for the warning? I thought of
>>>         that in passing, but the real world experience is really
>>>         useful. I could easily see wanting that stuff to be retained
>>>         less often than the main records, which is what I’d ask for.
>>>
>>>         I assume that archiving, in general, would also remove this
>>>         stuff, since old jobs themselves will be removed?
>>>
>>>         --
>>>         #BlackLivesMatter
>>>         ____
>>>         || \\UTGERS,
>>>         |---------------------------*O*---------------------------
>>>         ||_// the State |         Ryan Novosielski -
>>>         novosirj at rutgers.edu
>>>         || \\ University | Sr. Technologist - 973/972.0922 (2x0922)
>>>         ~*~ RBHS Campus
>>>         ||  \\    of NJ | Office of Advanced Research Computing -
>>>         MSB A555B, Newark
>>>              `'
>>>
>>>>         On Sep 28, 2023, at 13:48, Paul Edmon
>>>>         <pedmon at cfa.harvard.edu> <mailto:pedmon at cfa.harvard.edu> wrote:
>>>>
>>>>         Slurm should take care of it when you add it.
>>>>
>>>>         So far as horror stories, under previous versions our
>>>>         database size ballooned to be so massive that it actually
>>>>         prevented us from upgrading and we had to drop the columns
>>>>         containing the job_script and job_env.  This was back
>>>>         before slurm started hashing the scripts so that it would
>>>>         only store one copy of duplicate scripts.  After this point
>>>>         we found that the job_script database stayed at a fairly
>>>>         reasonable size as most users use functionally the same
>>>>         script each time. However the job_env continued to grow
>>>>         like crazy as there are variables in our environment that
>>>>         change fairly consistently depending on where the user is.
>>>>         Thus job_envs ended up being too massive to keep around and
>>>>         so we had to drop them. Frankly we never really used them
>>>>         for debugging. The job_scripts though are super useful and
>>>>         not that much overhead.
>>>>
>>>>         In summary my recommendation is to only store job_scripts.
>>>>         job_envs add too much storage for little gain, unless your
>>>>         job_envs are basically the same for each user in each location.
>>>>
>>>>         Also it should be noted that there is no way to prune out
>>>>         job_scripts or job_envs right now. So the only way to get
>>>>         rid of them if they get large is to 0 out the column in the
>>>>         table. You can ask SchedMD for the mysql command to do this
>>>>         as we had to do it here to our job_envs.
>>>>
>>>>         -Paul Edmon-
>>>>
>>>>         On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>>>>         In my current slurm installation, (recently upgraded to
>>>>>         slurm v23.02.3), I only have
>>>>>
>>>>>         AccountingStoreFlags=job_comment
>>>>>
>>>>>         I now intend to add both
>>>>>
>>>>>         AccountingStoreFlags=job_script
>>>>>         AccountingStoreFlags=job_env
>>>>>
>>>>>         leaving the default 4MB value for max_script_size
>>>>>
>>>>>         Do I need to do anything on the DB myself, or will slurm
>>>>>         take care of the additional tables if needed?
>>>>>
>>>>>         Any comments/suggestions/gotcha/pitfalls/horror_stories to
>>>>>         share? I know about the additional diskspace and
>>>>>         potentially load needed, and with our resources and
>>>>>         typical workload I should be okay with that.
>>>>>
>>>>>         Thanks!
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231002/c8837975/attachment.htm>