<div dir="ltr">And weirdly enough it has now stopped working again, after I did the experimentation for power save described in the other thread.<div>That is really strange. At the highest verbosity level the logs just say</div><div><br></div><div>slurmdbd: debug:  REQUEST_PERSIST_INIT: CLUSTER:cluster VERSION:9984 UID:1457 IP:192.168.2.254 CONN:13<br></div><div><br></div><div>I reconfigured and reverted stuff to no change. Does anybody have any clue?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 3, 2023 at 5:43 PM Davide DelVento <<a href="mailto:davide.quantum@gmail.com">davide.quantum@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">For others potentially seeing this on mailing list search, yes, I needed that, which of course required creating an account charge which I wasn't using. So I ran<div><br></div><div><span style="color:rgb(0,0,0);font-family:Consolas,"Bitstream Vera Sans Mono","Courier New",Courier,monospace;font-size:14px">sacctmgr add account default_account</span><br></div><div><span style="color:rgb(0,0,0);font-family:Consolas,"Bitstream Vera Sans Mono","Courier New",Courier,monospace;font-size:14px">sacctmgr add -i user $user Accounts=default_account</span><span style="color:rgb(0,0,0);font-family:Consolas,"Bitstream Vera Sans Mono","Courier New",Courier,monospace;font-size:14px"><br></span></div></div><br><div class="gmail_quote"><div class="gmail_attr">with an appropriate looping around for $user and everything is working fine now.</div><div class="gmail_attr"><br></div><div class="gmail_attr">Thanks everybody!</div><div class="gmail_attr"><br></div><div dir="ltr" class="gmail_attr">On Tue, Oct 3, 2023 at 7:44 AM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

  
    
  
  <div>
    <p>You will probably need to.</p>
    <p>The way we handle it is that we add users when the first submit a
      job via the job_submit.lua script. This way the database
      autopopulates with active users.</p>
    <p>-Paul Edmon-<br>
    </p>
    <div>On 10/3/23 9:01 AM, Davide DelVento
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>By increasing the slurmdbd verbosity level, I got
          additional information, namely the following:</div>
        <div><br>
          slurmdbd: error: couldn't get information for this user
          (null)(xxxxxx)<br>
        </div>
        slurmdbd: debug: accounting_storage/as_mysql:
        as_mysql_jobacct_process_get_jobs: User 
        xxxxxx  has no associations, and is not admin, so not returning
        any jobs.<br>
        <div><br>
        </div>
        <div>again where xxxxx is the posix ID of the user who's running
          the query in the slurmdbd logs.<br>
        </div>
        <div><br>
        </div>
        <div>I suspect this is due to the fact that our userbase is
          small enough (we are a department HPC) that we don't need to
          use allocation and the like, so I have not configured any
          association (and not even studied its configuration, since
          when I was at another place which did use associations,
          someone else took care of slurm administration).</div>
        <div><br>
        </div>
        <div>Anyway, I read the fantastic document by our own member at
          <a href="https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations" target="_blank">https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations</a>
          and in fact I have not even configured slurm users:</div>
        <div><br>
        </div>
        <div># sacctmgr show user<br>
                User   Def Acct     Admin<br>
          ---------- ---------- ---------<br>
                root       root Administ+<br>
        </div>
        <div>#</div>
        <div><br>
        </div>
        <div>So is that the issue? Should I just add all users? Any
          suggestions on the minimal (but robust) way to do that?</div>
        <div><br>
        </div>
        <div>Thanks!</div>
        <div><br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, Oct 2, 2023 at 9:20 AM
          Davide DelVento <<a href="mailto:davide.quantum@gmail.com" target="_blank">davide.quantum@gmail.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">Thanks Paul, this helps.
            <div><br>
              <div>I don't have any PrivateData line in either config
                file. According to the docs, "By default, all
                information is visible to all users" so this should not
                be an issue. I tried to add a line with
                "PrivateData=jobs" to the conf files, just in case, but
                that didn't change the behavior.</div>
            </div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Mon, Oct 2, 2023 at
              9:10 AM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div>
                <p>At least in our setup, users can see their own
                  scripts by doing sacct -B -j JOBID</p>
                <p>I would make sure that the scripts are being stored
                  and how you have PrivateData set.</p>
                <p>-Paul Edmon-<br>
                </p>
                <div>On 10/2/2023 10:57 AM, Davide DelVento wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="ltr">I deployed the job_script archival and
                    it is working, however it can be queried only by
                    root. 
                    <div><br>
                    </div>
                    <div>A regular user can run sacct -lj towards any
                      jobs (even those by other users, and that's okay
                      in our setup) with no problem. However if they run
                      sacct -j job_id --batch-script even against a job
                      they own themselves, nothing is returned and I get
                      a</div>
                    <div>
                      <div><br>
                      </div>
                      <div>slurmdbd: error: couldn't get information for
                        this user (null)(xxxxxx)</div>
                      <div><br>
                      </div>
                      <div>where xxxxx is the posix ID of the user who's
                        running the query in the slurmdbd logs.</div>
                      <div><br>
                      </div>
                      <div>Both configure files slurmdbd.conf
                        and slurm.conf do not have any "permission"
                        setting. FWIW, we use LDAP.</div>
                      <div><br>
                      </div>
                      <div>Is that the expected behavior, in that by
                        default only root can see the job scripts? I was
                        assuming the users themselves should be able to
                        debug their own jobs... Any hint on what could
                        be changed to achieve this?</div>
                      <div><br>
                      </div>
                      <div>Thanks!<br>
                        <div><br>
                        </div>
                      </div>
                      <div><br>
                      </div>
                    </div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Fri, Sep 29,
                      2023 at 5:48 AM Davide DelVento <<a href="mailto:davide.quantum@gmail.com" target="_blank">davide.quantum@gmail.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">Fantastic, this is really helpful,
                        thanks!</div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Thu, Sep
                          28, 2023 at 12:05 PM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div>
                            <p>Yes it was later than that. If you are
                              23.02 you are good.  We've been running
                              with storing job_scripts on for years at
                              this point and that part of the database
                              only uses up 8.4G.  Our entire database
                              takes up 29G on disk. So its about 1/3 of
                              the database.  We also have database
                              compression which helps with the on disk
                              size. Raw uncompressed our database is
                              about 90G.  We keep 6 months of data in
                              our active database.<br>
                            </p>
                            <p>-Paul Edmon-<br>
                            </p>
                            <div>On 9/28/2023 1:57 PM, Ryan Novosielski
                              wrote:<br>
                            </div>
                            <blockquote type="cite"> Sorry for the
                              duplicate e-mail in a short time: do you
                              know (or anyone) when the hashing was
                              added? Was planning to enable this on
                              21.08, but we then had to delay our
                              upgrade to it. I’m assuming later than
                              that, as I believe that’s when the feature
                              was added.
                              <div><br>
                                <blockquote type="cite">
                                  <div>On Sep 28, 2023, at 13:55, Ryan
                                    Novosielski <a href="mailto:novosirj@rutgers.edu" target="_blank"><novosirj@rutgers.edu></a>
                                    wrote:</div>
                                  <br>
                                  <div>
                                    <div> Thank you; we’ll put in a
                                      feature request for improvements
                                      in that area, and also thanks for
                                      the warning? I thought of that in
                                      passing, but the real world
                                      experience is really useful. I
                                      could easily see wanting that
                                      stuff to be retained less often
                                      than the main records, which is
                                      what I’d ask for.
                                      <div><br>
                                      </div>
                                      <div>I assume that archiving, in
                                        general, would also remove this
                                        stuff, since old jobs themselves
                                        will be removed?</div>
                                      <div><br>
                                        <div>
                                          <div>
                                            <div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
                                              <div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
                                                <div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
                                                  <div dir="auto" style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">
                                                    <div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
                                                      --<br>
                                                      #BlackLivesMatter</div>
                                                    <div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
                                                      ____<br>
                                                      || \\UTGERS,    
                                                      |---------------------------*O*---------------------------<br>
                                                      ||_// the State<span style="white-space:pre-wrap"> </span> |         Ryan Novosielski - <a href="mailto:novosirj@rutgers.edu" target="_blank">novosirj@rutgers.edu</a><br>
                                                      || \\ University |
                                                      Sr. Technologist
                                                      - 973/972.0922
                                                      (2x0922) ~*~
                                                      RBHS Campus<br>
                                                      ||  \\    of NJ<span style="white-space:pre-wrap"> </span> | Office of Advanced Research
                                                      Computing - MSB
                                                      A555B, Newark<br>
                                                           `'</div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                          <div><br>
                                            <blockquote type="cite">
                                              <div>On Sep 28, 2023, at
                                                13:48, Paul Edmon <a href="mailto:pedmon@cfa.harvard.edu" target="_blank"><pedmon@cfa.harvard.edu></a>
                                                wrote:</div>
                                              <br>
                                              <div>
                                                <div>Slurm should take
                                                  care of it when you
                                                  add it.<br>
                                                  <br>
                                                  So far as horror
                                                  stories, under
                                                  previous versions our
                                                  database size
                                                  ballooned to be so
                                                  massive that it
                                                  actually prevented us
                                                  from upgrading and we
                                                  had to drop the
                                                  columns containing the
                                                  job_script and
                                                  job_env.  This was
                                                  back before slurm
                                                  started hashing the
                                                  scripts so that it
                                                  would only store one
                                                  copy of duplicate
                                                  scripts.  After this
                                                  point we found that
                                                  the job_script
                                                  database stayed at a
                                                  fairly reasonable size
                                                  as most users use
                                                  functionally the same
                                                  script each time.
                                                  However the job_env
                                                  continued to grow like
                                                  crazy as there are
                                                  variables in our
                                                  environment that
                                                  change fairly
                                                  consistently depending
                                                  on where the user is.
                                                  Thus job_envs ended up
                                                  being too massive to
                                                  keep around and so we
                                                  had to drop them.
                                                  Frankly we never
                                                  really used them for
                                                  debugging. The
                                                  job_scripts though are
                                                  super useful and not
                                                  that much overhead.<br>
                                                  <br>
                                                  In summary my
                                                  recommendation is to
                                                  only store
                                                  job_scripts. job_envs
                                                  add too much storage
                                                  for little gain,
                                                  unless your job_envs
                                                  are basically the same
                                                  for each user in each
                                                  location.<br>
                                                  <br>
                                                  Also it should be
                                                  noted that there is no
                                                  way to prune out
                                                  job_scripts or
                                                  job_envs right now. So
                                                  the only way to get
                                                  rid of them if they
                                                  get large is to 0 out
                                                  the column in the
                                                  table. You can ask
                                                  SchedMD for the mysql
                                                  command to do this as
                                                  we had to do it here
                                                  to our job_envs.<br>
                                                  <br>
                                                  -Paul Edmon-<br>
                                                  <br>
                                                  On 9/28/2023 1:40 PM,
                                                  Davide DelVento wrote:<br>
                                                  <blockquote type="cite">In my
                                                    current slurm
                                                    installation,
                                                    (recently upgraded
                                                    to slurm v23.02.3),
                                                    I only have<br>
                                                    <br>
AccountingStoreFlags=job_comment<br>
                                                    <br>
                                                    I now intend to add
                                                    both<br>
                                                    <br>
AccountingStoreFlags=job_script<br>
AccountingStoreFlags=job_env<br>
                                                    <br>
                                                    leaving the default
                                                    4MB value
                                                    for max_script_size<br>
                                                    <br>
                                                    Do I need to do
                                                    anything on the DB
                                                    myself, or will
                                                    slurm take care of
                                                    the additional
                                                    tables if needed?<br>
                                                    <br>
                                                    Any
                                                    comments/suggestions/gotcha/pitfalls/horror_stories
                                                    to share? I know
                                                    about the additional
                                                    diskspace and
                                                    potentially load
                                                    needed, and with our
                                                    resources and
                                                    typical workload I
                                                    should be okay with
                                                    that.<br>
                                                    <br>
                                                    Thanks!<br>
                                                  </blockquote>
                                                  <br>
                                                </div>
                                              </div>
                                            </blockquote>
                                          </div>
                                          <br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                              <br>
                            </blockquote>
                          </div>
                        </blockquote>
                      </div>
                    </blockquote>
                  </div>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div></div>
</blockquote></div>