[slurm-users] enabling job script archival
Paul Edmon
pedmon at cfa.harvard.edu
Tue Oct 3 13:42:18 UTC 2023
You will probably need to.
The way we handle it is that we add users when the first submit a job
via the job_submit.lua script. This way the database autopopulates with
active users.
-Paul Edmon-
On 10/3/23 9:01 AM, Davide DelVento wrote:
> By increasing the slurmdbd verbosity level, I got additional
> information, namely the following:
>
> slurmdbd: error: couldn't get information for this user (null)(xxxxxx)
> slurmdbd: debug: accounting_storage/as_mysql:
> as_mysql_jobacct_process_get_jobs: User xxxxxx has no associations,
> and is not admin, so not returning any jobs.
>
> again where xxxxx is the posix ID of the user who's running the query
> in the slurmdbd logs.
>
> I suspect this is due to the fact that our userbase is small enough
> (we are a department HPC) that we don't need to use allocation and the
> like, so I have not configured any association (and not even studied
> its configuration, since when I was at another place which did use
> associations, someone else took care of slurm administration).
>
> Anyway, I read the fantastic document by our own member at
> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations
> and in fact I have not even configured slurm users:
>
> # sacctmgr show user
> User Def Acct Admin
> ---------- ---------- ---------
> root root Administ+
> #
>
> So is that the issue? Should I just add all users? Any suggestions on
> the minimal (but robust) way to do that?
>
> Thanks!
>
>
> On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento
> <davide.quantum at gmail.com> wrote:
>
> Thanks Paul, this helps.
>
> I don't have any PrivateData line in either config file. According
> to the docs, "By default, all information is visible to all users"
> so this should not be an issue. I tried to add a line with
> "PrivateData=jobs" to the conf files, just in case, but that
> didn't change the behavior.
>
> On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon <pedmon at cfa.harvard.edu>
> wrote:
>
> At least in our setup, users can see their own scripts by
> doing sacct -B -j JOBID
>
> I would make sure that the scripts are being stored and how
> you have PrivateData set.
>
> -Paul Edmon-
>
> On 10/2/2023 10:57 AM, Davide DelVento wrote:
>> I deployed the job_script archival and it is working, however
>> it can be queried only by root.
>>
>> A regular user can run sacct -lj towards any jobs (even those
>> by other users, and that's okay in our setup) with no
>> problem. However if they run sacct -j job_id --batch-script
>> even against a job they own themselves, nothing is returned
>> and I get a
>>
>> slurmdbd: error: couldn't get information for this user
>> (null)(xxxxxx)
>>
>> where xxxxx is the posix ID of the user who's running the
>> query in the slurmdbd logs.
>>
>> Both configure files slurmdbd.conf and slurm.conf do not have
>> any "permission" setting. FWIW, we use LDAP.
>>
>> Is that the expected behavior, in that by default only root
>> can see the job scripts? I was assuming the users themselves
>> should be able to debug their own jobs... Any hint on what
>> could be changed to achieve this?
>>
>> Thanks!
>>
>>
>>
>> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento
>> <davide.quantum at gmail.com> wrote:
>>
>> Fantastic, this is really helpful, thanks!
>>
>> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon
>> <pedmon at cfa.harvard.edu> wrote:
>>
>> Yes it was later than that. If you are 23.02 you are
>> good. We've been running with storing job_scripts on
>> for years at this point and that part of the database
>> only uses up 8.4G. Our entire database takes up 29G
>> on disk. So its about 1/3 of the database. We also
>> have database compression which helps with the on
>> disk size. Raw uncompressed our database is about
>> 90G. We keep 6 months of data in our active database.
>>
>> -Paul Edmon-
>>
>> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>>> Sorry for the duplicate e-mail in a short time: do
>>> you know (or anyone) when the hashing was added? Was
>>> planning to enable this on 21.08, but we then had to
>>> delay our upgrade to it. I’m assuming later than
>>> that, as I believe that’s when the feature was added.
>>>
>>>> On Sep 28, 2023, at 13:55, Ryan Novosielski
>>>> <novosirj at rutgers.edu>
>>>> <mailto:novosirj at rutgers.edu> wrote:
>>>>
>>>> Thank you; we’ll put in a feature request for
>>>> improvements in that area, and also thanks for the
>>>> warning? I thought of that in passing, but the real
>>>> world experience is really useful. I could easily
>>>> see wanting that stuff to be retained less often
>>>> than the main records, which is what I’d ask for.
>>>>
>>>> I assume that archiving, in general, would also
>>>> remove this stuff, since old jobs themselves will
>>>> be removed?
>>>>
>>>> --
>>>> #BlackLivesMatter
>>>> ____
>>>> || \\UTGERS,
>>>> |---------------------------*O*---------------------------
>>>> ||_// the State | Ryan Novosielski -
>>>> novosirj at rutgers.edu
>>>> || \\ University | Sr. Technologist - 973/972.0922
>>>> (2x0922) ~*~ RBHS Campus
>>>> || \\ of NJ | Office of Advanced Research
>>>> Computing - MSB A555B, Newark
>>>> `'
>>>>
>>>>> On Sep 28, 2023, at 13:48, Paul Edmon
>>>>> <pedmon at cfa.harvard.edu>
>>>>> <mailto:pedmon at cfa.harvard.edu> wrote:
>>>>>
>>>>> Slurm should take care of it when you add it.
>>>>>
>>>>> So far as horror stories, under previous versions
>>>>> our database size ballooned to be so massive that
>>>>> it actually prevented us from upgrading and we had
>>>>> to drop the columns containing the job_script and
>>>>> job_env. This was back before slurm started
>>>>> hashing the scripts so that it would only store
>>>>> one copy of duplicate scripts. After this point
>>>>> we found that the job_script database stayed at a
>>>>> fairly reasonable size as most users use
>>>>> functionally the same script each time. However
>>>>> the job_env continued to grow like crazy as there
>>>>> are variables in our environment that change
>>>>> fairly consistently depending on where the user
>>>>> is. Thus job_envs ended up being too massive to
>>>>> keep around and so we had to drop them. Frankly we
>>>>> never really used them for debugging. The
>>>>> job_scripts though are super useful and not that
>>>>> much overhead.
>>>>>
>>>>> In summary my recommendation is to only store
>>>>> job_scripts. job_envs add too much storage for
>>>>> little gain, unless your job_envs are basically
>>>>> the same for each user in each location.
>>>>>
>>>>> Also it should be noted that there is no way to
>>>>> prune out job_scripts or job_envs right now. So
>>>>> the only way to get rid of them if they get large
>>>>> is to 0 out the column in the table. You can ask
>>>>> SchedMD for the mysql command to do this as we had
>>>>> to do it here to our job_envs.
>>>>>
>>>>> -Paul Edmon-
>>>>>
>>>>> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>>>>> In my current slurm installation,
>>>>>> (recently upgraded to slurm v23.02.3), I only have
>>>>>>
>>>>>> AccountingStoreFlags=job_comment
>>>>>>
>>>>>> I now intend to add both
>>>>>>
>>>>>> AccountingStoreFlags=job_script
>>>>>> AccountingStoreFlags=job_env
>>>>>>
>>>>>> leaving the default 4MB value for max_script_size
>>>>>>
>>>>>> Do I need to do anything on the DB myself, or
>>>>>> will slurm take care of the additional tables if
>>>>>> needed?
>>>>>>
>>>>>> Any
>>>>>> comments/suggestions/gotcha/pitfalls/horror_stories
>>>>>> to share? I know about the additional diskspace
>>>>>> and potentially load needed, and with our
>>>>>> resources and typical workload I should be okay
>>>>>> with that.
>>>>>>
>>>>>> Thanks!
>>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231003/e99a6576/attachment-0001.htm>
More information about the slurm-users
mailing list