[slurm-users] EXTERNAL-Re: Block jobs on GPU partition when GPU is not specified

Ratnasamy, Fritz fritz.ratnasamy at chicagobooth.edu
Mon Sep 27 17:31:13 UTC 2021


Hi Michael Renfro,

Thanks for your reply. Based on your answers, would this work:
1/ a function job_submit.lua with the following contents (just need a
function that errored when gres:gpu is not specified in srun or in sbatch):

function slurm_job_submit(job_desc, part_list, submit_uid)

        if job_desc.partition == 'gpu' then
                     if  (job_desc.gres == nil) then
                              slurm.log_info("User did not specified
gres=gpu: ")
                              slurm.user_msg("You have to specify
gres=gpu:x  where x is number of GPUs.")
                              return slurm.ERROR
                     end
        end
end


4/  I found out a file  the file job_submit_lua.so in our controller in
/lib64/slurm/ and also the lua lib seems to be installed:
 sudo rpm -qa | grep lua

lua-5.3.4-11.el8.x86_64
lua-libs-5.3.4-11.el8.x86_64
lua-devel-5.3.4-11.el8.x86_64

 so I guess for now I just need to create job_submit.lua, uncomment the job
plugin in slurm.conf/ is there any Slurm service to restart after that?

Thanks again

*Fritz Ratnasamy*

Data Scientist

Information Technology

The University of Chicago

Booth School of Business

5807 S. Woodlawn

Chicago, Illinois 60637

Phone: +(1) 773-834-4556


On Sat, Sep 25, 2021 at 11:08 AM Renfro, Michael <Renfro at tntech.edu> wrote:

> If you haven't already seen it there's an example Lua script from SchedMD
> at [1], and I've got a copy of our local script at [2]. Otherwise, in the
> order you asked:
>
>
>
>    1. That seems reasonable, but our script just checks if there's a gres
>    at all. I don't *think* any gres other than gres=gpu would let the job run,
>    since our GPU nodes only have Gres=gpu:2 entries. Same thing for asking for
>    more GPUs than are in the node: if someone asked for gres=gpu:3 or higher,
>    the job would get blocked.
>
>    The above might be an annoyance to your users if their job just sits
>    in the queue with no other notice, but it hasn't really been an issue here.
>    The big benefit from your side would be that you could simplify the if
>    statement down to something like 'if (job_desc.gres ~= nil)'.
>
>    2. yes, uncomment JobSubmitPlugins=lua
>
>    3. Far as I know, if you uncomment the JobSubmitPlugin line and have a
>    job_submit.lua file in the same folder as your slurm.conf, the Lua script
>    should get executed automatically.
>
>    4. Our RPM installations of Slurm contained the job_submit_lua.so,
>    both for Bright 8 and for OpenHPC.
>
>
>
> [1]
> https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
>
> [2] https://gist.github.com/mikerenfro/df89fac5052a45cc2c1651b9a30978e0
>
>
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu>
> *Date: *Saturday, September 25, 2021 at 12:23 AM
> *To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject: *[slurm-users] Block jobs on GPU partition when GPU is not
> specified
>
> *External Email Warning*
>
> *This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.*
> ------------------------------
>
> Hi,
>
> I would like to block jobs submitted in our GPU partition when gres=gpu:1
> (or any number between 1 and 4) is not specified when submitting a job
> through sbatch or requesting an interactive session with srun.
>
> Currently, /etc/slurm/slurm.conf has JobSumitPlugins=lua commented.
> The liblua.so is now installed.
>
> I would like to use something similar as the example mentioned at the end
> of the page:
> https://slurm.schedmd.com/resource_limits.html
> <https://slurm.schedmd.com/resource_limits.html%0b>Can I use the
> following code :
>
> function slurm_job_submit(job_desc, part_list, submit_uid)
>
>    if (job_desc.gres ~= nil)
>
>    then
>
>       for g in job_desc.gres:gmatch("[^,]+")
>
>       do
>
>          bad = string.match(g,'^gpu[:]*[0-9]*$')
>
>          if (bad ~= nil)
>
>          then
>
>             slurm.log_info("User specified gpu GRES without type: %s", bad)
>
>             slurm.user_msg("You must always specify a type when requesting gpu GRES")
>
>             return slurm.ERROR
>
>          end
>
>       end
>
>    end
>
> end
>
> I do not need to check if the model is specified though. In that case,
>
> 1/ Should I change the line bad = string.match(g,'^gpu[:]*[0-9]*$') to
> string.match(g,'^gpu[:]*[0-9]')
>
> 2/ Do I need to uncomment  JobSumitPlugins=lua
>
> 3/ Where to specify the function call slurm_job_submit so I make sure the
> check to see if gres=gpu:1 is happening?
> 4/ I would need job_submit_lua.so, where can I find that library and if it
> is not there, how can i dowload it?
>
> Thanks for your help. I am new to regular expressions, lua and Slurm so I
> apologize if my questions do not make sense.
>
>
> *Fritz Ratnasamy*
>
> Data Scientist
>
> Information Technology
>
> The University of Chicago
>
> Booth School of Business
>
> 5807 S. Woodlawn
>
> Chicago, Illinois 60637
>
> Phone: +(1) 773-834-4556
>
> CAUTION: This email has originated outside of University email systems.
> Please do not click links or open attachments unless you recognize the
> sender and trust the contents as safe.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210927/85396cfc/attachment-0001.htm>


More information about the slurm-users mailing list