[slurm-users] License management and invoking scontrol in the prolog

Davide DelVento davide.quantum at gmail.com
Thu Sep 8 15:22:34 UTC 2022


Thanks Ole, for this clarification, this is very good to know.

However, the problem is that the very example provided by slurm itself
is the one that has the error. I removed the unpack part with the
variable arguments and that fixed that part.

Unfortunately, the job_desc table is always empty so the whole
job_submit.lua seems like a moot point? Or the example is so outdated
(given that it cannot even log correctly) that this is now performed
in a different way??
Davide

On Thu, Sep 8, 2022 at 12:23 AM Ole Holm Nielsen
<Ole.H.Nielsen at fysik.dtu.dk> wrote:
>
> Hi Davide,
>
> In your slurmctld log you see an entry "error: job_submit/lua:
> /opt/slurm/job_submit.lua".
>
> What I think happens is that when slurmctld encounters an error in
> job_submit.lua, it will revert to the last known good script cached by
> slurmctld and ignore the file on disk from now on, even if it has been
> corrected.  An "scontrol reconfig" may make slurmctld reread the
> job_submit.lua, please try it.
>
> I believe that this slurmctld behavior is undocumented at present.  Please
> see https://bugs.schedmd.com/show_bug.cgi?id=14472#c15 for a description:
>
> > And, if after the reconfigure, the job_submit.lua is wrong formatted (or missing), it will use the previous version of the script (which we have stored backup previously):
>
> /Ole
>
>
> On 9/7/22 14:21, Davide DelVento wrote:
> > Thanks Ole, your wiki page sheds some light on this mystery.
> > Very frustrating that even the simple example provided in the release
> > fails, and it fails at the most basic logging functionality.
> >
> > Note that "my" job_submit.lua is now the unmodified, slurm-provided
> > one.... and that the luac command returns nothing in my case (this is
> > Lua 5.3.4) so syntax seems correct?
> >
> > Yet the logs report the problem I mentioned rather than the actual
> > content that the plugin is attempting to log.
> >
> > On Wed, Sep 7, 2022 at 2:13 AM Ole Holm Nielsen
> > <Ole.H.Nielsen at fysik.dtu.dk> wrote:
> >>
> >> Hi Davide,
> >>
> >> I suggest that you check your job_submit.lua script with the LUA compiler:
> >>
> >> luac -p /etc/slurm/job_submit.lua
> >>
> >> I have written some more details in my Wiki page
> >> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#job-submit-plugins
> >>
> >> Best regards,
> >> Ole
> >>
> >> On 9/7/22 01:51, Davide DelVento wrote:
> >>> Thanks again to both of you.
> >>>
> >>> I actually did not build Slurm myself, otherwise I'd keep extensive
> >>> logs of what I did. Other people did, so I don't know. However, I get
> >>> the same grep'ing results as yours.
> >>>
> >>> Looking at the logs reveals some info, but it's cryptic.
> >>>
> >>> [2022-09-06T17:33:56.513] debug3: job_submit/lua:
> >>> slurm_lua_loadscript: skipping loading Lua script:
> >>> /opt/slurm/job_submit.lua
> >>> [2022-09-06T17:33:56.513] error: job_submit/lua:
> >>> /opt/slurm/job_submit.lua: [string "slurm.user_msg
> >>> (string.format(table.unpack({...})))"]:1: bad argument #2 to 'format'
> >>> (no value)
> >>>
> >>> As you can see, there is no line number and there is nothing like
> >>> user_msg in this code. There is indeed an "unpack" which is used in
> >>> the SchedMD-defined logging helper function which has a comment
> >>> "Implicit definition of arg was removed in Lua 5.2" and that's where I
> >>> speculate the error occurs.
> >>>
> >>> I should stress, this is with their own example, not my code. I guess
> >>> I could forgo the logging and move forward, but that won't probably
> >>> lead me very far.
> >>>
> >>> I am contemplating submitting a github issue about it? I did check
> >>> that the version of the job_submit.lua I have is the same currently in
> >>> the repo at https://github.com/SchedMD/slurm/blob/master/etc/job_submit.lua.example
> >>>
> >>> On Thu, Sep 1, 2022 at 11:55 PM Ole Holm Nielsen
> >>> <Ole.H.Nielsen at fysik.dtu.dk> wrote:
> >>>>
> >>>> Did you install all prerequiste packages (including lua) on the server
> >>>> where you built the Slurm packages?
> >>>>
> >>>> On my system I get:
> >>>>
> >>>> $ strings `which slurmctld ` | grep HAVE_LUA
> >>>> HAVE_LUA 1
> >>>>
> >>>> /Ole
> >>>>
> >>>> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites
> >>>>
> >>>> On 9/2/22 05:15, Davide DelVento wrote:
> >>>>> Thanks.
> >>>>>
> >>>>> I did try a lua script as soon as I got your first email, but that
> >>>>> never worked (yes, I enabled it in slurm.conf and ran "scontrol
> >>>>> reconfigure" after). Slurm simply acted as if there was no job_submit script.
> >>>>>
> >>>>> After various tests, all unsuccessful, today I found that link which I
> >>>>> mentioned saying that lua might not be compiled in, hence all my most
> >>>>> recent messages of this thread.
> >>>>>
> >>>>> That file is indeed there, so that's good news that I don't need to recompile.
> >>>>> However I'm puzzled on what might be missing...
> >>>>>
> >>>>>
> >>>>> On Thu, Sep 1, 2022 at 6:33 PM Brian Andrus <toomuchit at gmail.com> wrote:
> >>>>>>
> >>>>>> lua is the language you can use with the job_submit plugin.
> >>>>>>
> >>>>>> I was showing a quick way to see that job_submit capability is indeed in
> >>>>>> there.
> >>>>>>
> >>>>>> You can see if lua support is there by looking for the job_submit_lua.so
> >>>>>> file is there.
> >>>>>> It would be part of the slurm rpm (not the slurm-slurmctl rpm)
> >>>>>>
> >>>>>> Usually it would be found at /usr/lib64/slurm/job_submit_lua.so
> >>>>>>
> >>>>>> If that is there, you should be good with trying out a job_submit lua
> >>>>>> script.
> >>>>>>
> >>>>>> Brian Andrus
> >>>>>>
> >>>>>> On 9/1/2022 1:24 PM, Davide DelVento wrote:
> >>>>>>> Thanks again, Brian, indeed that grep returns many hits, but none of
> >>>>>>> them includes lua, i.e.
> >>>>>>>
> >>>>>>>      strings `which slurmctld ` | grep -i job_submit | grep -i lua
> >>>>>>>
> >>>>>>> returns nothing. So I should use the C rather than the more convenient
> >>>>>>> lua interface, unless I recompile or am I missing something?
> >>>>>>>
> >>>>>>> On Thu, Sep 1, 2022 at 12:30 PM Brian Andrus <toomuchit at gmail.com> wrote:
> >>>>>>>> I would be surprised if it were compiled without the support. However,
> >>>>>>>> you could check and run something like:
> >>>>>>>>
> >>>>>>>> strings /sbin/slurmctld | grep job_submit
> >>>>>>>>
> >>>>>>>> (or where ever your slurmctld binary is). There should be quite a few
> >>>>>>>> lines with that in it.
> >>>>>>>>
> >>>>>>>> Brian Andrus
> >>>>>>>>
> >>>>>>>> On 9/1/2022 10:54 AM, Davide DelVento wrote:
> >>>>>>>>> Thanks Brian for the suggestion, which I am now exploring.
> >>>>>>>>>
> >>>>>>>>> The documentation is a bit cryptic for me, but exploring a few things
> >>>>>>>>> and checking https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/
> >>>>>>>>> I suspect my slurm install (provided by cluster vendor) was not
> >>>>>>>>> compiled with the lua plugin installed. Do you know how to verify if
> >>>>>>>>> that is the case or if it's something else? I don't see a way to show
> >>>>>>>>> if the plugin is actually being "seen" by slurm, and I suspect it's
> >>>>>>>>> not.
> >>>>>>>>>
> >>>>>>>>> Does anyone else have other suggestions or comment on either the
> >>>>>>>>> plugin or the prolog workaround?
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Aug 30, 2022 at 3:01 PM Brian Andrus <toomuchit at gmail.com> wrote:
> >>>>>>>>>> Not sure if you can do all the things you intend, but the job_submit
> >>>>>>>>>> script is precisely where you want to check submission options.
> >>>>>>>>>>
> >>>>>>>>>> https://slurm.schedmd.com/job_submit_plugins.html
> >>>>>>>>>>
> >>>>>>>>>> Brian Andrus
> >>>>>>>>>>
> >>>>>>>>>> On 8/30/2022 12:58 PM, Davide DelVento wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to soft-enforce license utilization only when the -L is
> >>>>>>>>>>> set. My idea: check in the prolog if the license was requested and
> >>>>>>>>>>> only if it were, set the environmental variables needed for the
> >>>>>>>>>>> license.
> >>>>>>>>>>>
> >>>>>>>>>>> I looked at all environmental variables set by slurm and did not find
> >>>>>>>>>>> any related to the license as I was hoping.
> >>>>>>>>>>>
> >>>>>>>>>>> As a workaround, I could check
> >>>>>>>>>>>
> >>>>>>>>>>> scontrol show job $SLURM_JOB_ID | grep License
> >>>>>>>>>>>
> >>>>>>>>>>> and that would work, but (as discussed in other messages in this list)
> >>>>>>>>>>> the documentation at https://slurm.schedmd.com/prolog_epilog.html say
> >>>>>>>>>>>
> >>>>>>>>>>>> Prolog and Epilog scripts should be designed to be as short as possible
> >>>>>>>>>>>> and should not call Slurm commands (e.g. squeue, scontrol, sacctmgr,
> >>>>>>>>>>>> etc). [...] Slurm commands in these scripts can potentially lead to performance
> >>>>>>>>>>>> issues and should not be used.
> >>>>>>>>>>> This is a bit of a concern, since the prolog would be invoked for
> >>>>>>>>>>> every job on the cluster, and it's a prolog (rather than the epilogue
> >>>>>>>>>>> like discussed in earlier messages).
> >>>>>>>>>>>
> >>>>>>>>>>> So two questions:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) is there a better workaround to check in the prolog if the current
> >>>>>>>>>>> job requested a license and/or
> >>>>>>>>>>> 2) would this kind of use of scontrol be okay or is indeed a concern
>



More information about the slurm-users mailing list