<div dir="ltr">I deployed the job_script archival and it is working, however it can be queried only by root. <div><br></div><div>A regular user can run sacct -lj towards any jobs (even those by other users, and that's okay in our setup) with no problem. However if they run sacct -j job_id --batch-script even against a job they own themselves, nothing is returned and I get a</div><div><div><br></div><div>slurmdbd: error: couldn't get information for this user (null)(xxxxxx)</div><div><br></div><div>where xxxxx is the posix ID of the user who's running the query in the slurmdbd logs.</div><div><br></div><div>Both configure files slurmdbd.conf and slurm.conf do not have any "permission" setting. FWIW, we use LDAP.</div><div><br></div><div>Is that the expected behavior, in that by default only root can see the job scripts? I was assuming the users themselves should be able to debug their own jobs... Any hint on what could be changed to achieve this?</div><div><br></div><div>Thanks!<br><div><br></div></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento <<a href="mailto:davide.quantum@gmail.com">davide.quantum@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Fantastic, this is really helpful, thanks!</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Yes it was later than that. If you are 23.02 you are good. We've
been running with storing job_scripts on for years at this point
and that part of the database only uses up 8.4G. Our entire
database takes up 29G on disk. So its about 1/3 of the database.
We also have database compression which helps with the on disk
size. Raw uncompressed our database is about 90G. We keep 6
months of data in our active database.<br>
</p>
<p>-Paul Edmon-<br>
</p>
<div>On 9/28/2023 1:57 PM, Ryan Novosielski
wrote:<br>
</div>
<blockquote type="cite">
Sorry for the duplicate e-mail in a short time: do you know (or
anyone) when the hashing was added? Was planning to enable this on
21.08, but we then had to delay our upgrade to it. I’m assuming
later than that, as I believe that’s when the feature was added.
<div><br>
<blockquote type="cite">
<div>On Sep 28, 2023, at 13:55, Ryan Novosielski
<a href="mailto:novosirj@rutgers.edu" target="_blank"><novosirj@rutgers.edu></a> wrote:</div>
<br>
<div>
<div>
Thank you; we’ll put in a feature request for improvements
in that area, and also thanks for the warning? I thought
of that in passing, but the real world experience is
really useful. I could easily see wanting that stuff to be
retained less often than the main records, which is what
I’d ask for.
<div><br>
</div>
<div>I assume that archiving, in general, would also
remove this stuff, since old jobs themselves will be
removed?</div>
<div><br>
<div>
<div>
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
--<br>
#BlackLivesMatter</div>
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
____<br>
|| \\UTGERS,
|---------------------------*O*---------------------------<br>
||_// the State<span style="white-space:pre-wrap"> </span> |
Ryan Novosielski
- <a href="mailto:novosirj@rutgers.edu" target="_blank">novosirj@rutgers.edu</a><br>
|| \\ University | Sr. Technologist
- 973/972.0922 (2x0922) ~*~ RBHS Campus<br>
|| \\ of NJ<span style="white-space:pre-wrap"> </span> |
Office of Advanced Research Computing -
MSB A555B, Newark<br>
`'</div>
</div>
</div>
</div>
</div>
</div>
<div><br>
<blockquote type="cite">
<div>On Sep 28, 2023, at 13:48, Paul Edmon
<a href="mailto:pedmon@cfa.harvard.edu" target="_blank"><pedmon@cfa.harvard.edu></a> wrote:</div>
<br>
<div>
<div>Slurm should take care of it when you add
it.<br>
<br>
So far as horror stories, under previous
versions our database size ballooned to be so
massive that it actually prevented us from
upgrading and we had to drop the columns
containing the job_script and job_env. This
was back before slurm started hashing the
scripts so that it would only store one copy
of duplicate scripts. After this point we
found that the job_script database stayed at a
fairly reasonable size as most users use
functionally the same script each time.
However the job_env continued to grow like
crazy as there are variables in our
environment that change fairly consistently
depending on where the user is. Thus job_envs
ended up being too massive to keep around and
so we had to drop them. Frankly we never
really used them for debugging. The
job_scripts though are super useful and not
that much overhead.<br>
<br>
In summary my recommendation is to only store
job_scripts. job_envs add too much storage for
little gain, unless your job_envs are
basically the same for each user in each
location.<br>
<br>
Also it should be noted that there is no way
to prune out job_scripts or job_envs right
now. So the only way to get rid of them if
they get large is to 0 out the column in the
table. You can ask SchedMD for the mysql
command to do this as we had to do it here to
our job_envs.<br>
<br>
-Paul Edmon-<br>
<br>
On 9/28/2023 1:40 PM, Davide DelVento wrote:<br>
<blockquote type="cite">In my current slurm
installation, (recently upgraded to slurm
v23.02.3), I only have<br>
<br>
AccountingStoreFlags=job_comment<br>
<br>
I now intend to add both<br>
<br>
AccountingStoreFlags=job_script<br>
AccountingStoreFlags=job_env<br>
<br>
leaving the default 4MB value
for max_script_size<br>
<br>
Do I need to do anything on the DB myself,
or will slurm take care of the additional
tables if needed?<br>
<br>
Any
comments/suggestions/gotcha/pitfalls/horror_stories
to share? I know about the additional
diskspace and potentially load needed, and
with our resources and typical workload I
should be okay with that.<br>
<br>
Thanks!<br>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
</div>
</blockquote></div>
</blockquote></div>