<div dir="ltr"><div>By increasing the slurmdbd verbosity level, I got additional information, namely the following:</div><div><br class="gmail-Apple-interchange-newline">slurmdbd: error: couldn't get information for this user (null)(xxxxxx)<br></div>slurmdbd: debug: accounting_storage/as_mysql: as_mysql_jobacct_process_get_jobs: User
xxxxxx has no associations, and is not admin, so not returning any jobs.<br><div><br></div><div>again where xxxxx is the posix ID of the user who's running the query in the slurmdbd logs.<br></div><div><br></div><div>I suspect this is due to the fact that our userbase is small enough (we are a department HPC) that we don't need to use allocation and the like, so I have not configured any association (and not even studied its configuration, since when I was at another place which did use associations, someone else took care of slurm administration).</div><div><br></div><div>Anyway, I read the fantastic document by our own member at <a href="https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations">https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations</a> and in fact I have not even configured slurm users:</div><div><br></div><div># sacctmgr show user<br> User Def Acct Admin<br>---------- ---------- ---------<br> root root Administ+<br></div><div>#</div><div><br></div><div>So is that the issue? Should I just add all users? Any suggestions on the minimal (but robust) way to do that?</div><div><br></div><div>Thanks!</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento <<a href="mailto:davide.quantum@gmail.com">davide.quantum@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks Paul, this helps.<div><br><div>I don't have any PrivateData line in either config file. According to the docs, "By default, all information is visible to all users" so this should not be an issue. I tried to add a line with "PrivateData=jobs" to the conf files, just in case, but that didn't change the behavior.</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>At least in our setup, users can see their own scripts by doing
sacct -B -j JOBID</p>
<p>I would make sure that the scripts are being stored and how you
have PrivateData set.</p>
<p>-Paul Edmon-<br>
</p>
<div>On 10/2/2023 10:57 AM, Davide DelVento
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I deployed the job_script archival and it is
working, however it can be queried only by root.
<div><br>
</div>
<div>A regular user can run sacct -lj towards any jobs (even
those by other users, and that's okay in our setup) with no
problem. However if they run sacct -j job_id --batch-script
even against a job they own themselves, nothing is returned
and I get a</div>
<div>
<div><br>
</div>
<div>slurmdbd: error: couldn't get information for this user
(null)(xxxxxx)</div>
<div><br>
</div>
<div>where xxxxx is the posix ID of the user who's running the
query in the slurmdbd logs.</div>
<div><br>
</div>
<div>Both configure files slurmdbd.conf and slurm.conf do not
have any "permission" setting. FWIW, we use LDAP.</div>
<div><br>
</div>
<div>Is that the expected behavior, in that by default only
root can see the job scripts? I was assuming the users
themselves should be able to debug their own jobs... Any
hint on what could be changed to achieve this?</div>
<div><br>
</div>
<div>Thanks!<br>
<div><br>
</div>
</div>
<div><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Sep 29, 2023 at
5:48 AM Davide DelVento <<a href="mailto:davide.quantum@gmail.com" target="_blank">davide.quantum@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Fantastic, this is really helpful, thanks!</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Sep 28, 2023 at
12:05 PM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Yes it was later than that. If you are 23.02 you are
good. We've been running with storing job_scripts on
for years at this point and that part of the database
only uses up 8.4G. Our entire database takes up 29G
on disk. So its about 1/3 of the database. We also
have database compression which helps with the on disk
size. Raw uncompressed our database is about 90G. We
keep 6 months of data in our active database.<br>
</p>
<p>-Paul Edmon-<br>
</p>
<div>On 9/28/2023 1:57 PM, Ryan Novosielski wrote:<br>
</div>
<blockquote type="cite"> Sorry for the duplicate e-mail
in a short time: do you know (or anyone) when the
hashing was added? Was planning to enable this on
21.08, but we then had to delay our upgrade to it. I’m
assuming later than that, as I believe that’s when the
feature was added.
<div><br>
<blockquote type="cite">
<div>On Sep 28, 2023, at 13:55, Ryan Novosielski <a href="mailto:novosirj@rutgers.edu" target="_blank"><novosirj@rutgers.edu></a>
wrote:</div>
<br>
<div>
<div> Thank you; we’ll put in a feature request
for improvements in that area, and also thanks
for the warning? I thought of that in passing,
but the real world experience is really
useful. I could easily see wanting that stuff
to be retained less often than the main
records, which is what I’d ask for.
<div><br>
</div>
<div>I assume that archiving, in general,
would also remove this stuff, since old jobs
themselves will be removed?</div>
<div><br>
<div>
<div>
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
--<br>
#BlackLivesMatter</div>
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
____<br>
|| \\UTGERS,
|---------------------------*O*---------------------------<br>
||_// the State<span style="white-space:pre-wrap"> </span> |
Ryan Novosielski - <a href="mailto:novosirj@rutgers.edu" target="_blank">novosirj@rutgers.edu</a><br>
|| \\ University | Sr.
Technologist - 973/972.0922
(2x0922) ~*~ RBHS Campus<br>
|| \\ of NJ<span style="white-space:pre-wrap"> </span> |
Office of Advanced Research
Computing - MSB A555B, Newark<br>
`'</div>
</div>
</div>
</div>
</div>
</div>
<div><br>
<blockquote type="cite">
<div>On Sep 28, 2023, at 13:48, Paul
Edmon <a href="mailto:pedmon@cfa.harvard.edu" target="_blank"><pedmon@cfa.harvard.edu></a>
wrote:</div>
<br>
<div>
<div>Slurm should take care of it
when you add it.<br>
<br>
So far as horror stories, under
previous versions our database
size ballooned to be so massive
that it actually prevented us from
upgrading and we had to drop the
columns containing the job_script
and job_env. This was back before
slurm started hashing the scripts
so that it would only store one
copy of duplicate scripts. After
this point we found that the
job_script database stayed at a
fairly reasonable size as most
users use functionally the same
script each time. However the
job_env continued to grow like
crazy as there are variables in
our environment that change fairly
consistently depending on where
the user is. Thus job_envs ended
up being too massive to keep
around and so we had to drop them.
Frankly we never really used them
for debugging. The job_scripts
though are super useful and not
that much overhead.<br>
<br>
In summary my recommendation is to
only store job_scripts. job_envs
add too much storage for little
gain, unless your job_envs are
basically the same for each user
in each location.<br>
<br>
Also it should be noted that there
is no way to prune out job_scripts
or job_envs right now. So the only
way to get rid of them if they get
large is to 0 out the column in
the table. You can ask SchedMD for
the mysql command to do this as we
had to do it here to our job_envs.<br>
<br>
-Paul Edmon-<br>
<br>
On 9/28/2023 1:40 PM, Davide
DelVento wrote:<br>
<blockquote type="cite">In my
current slurm installation,
(recently upgraded to slurm
v23.02.3), I only have<br>
<br>
AccountingStoreFlags=job_comment<br>
<br>
I now intend to add both<br>
<br>
AccountingStoreFlags=job_script<br>
AccountingStoreFlags=job_env<br>
<br>
leaving the default 4MB value
for max_script_size<br>
<br>
Do I need to do anything on the
DB myself, or will slurm take
care of the additional tables if
needed?<br>
<br>
Any
comments/suggestions/gotcha/pitfalls/horror_stories
to share? I know about the
additional diskspace and
potentially load needed, and
with our resources and typical
workload I should be okay with
that.<br>
<br>
Thanks!<br>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>
</blockquote></div>