[slurm-users] EXTERNAL-Re: Block jobs on GPU partition when GPU is not specified

Ratnasamy, Fritz fritz.ratnasamy at chicagobooth.edu
Mon Sep 27 18:58:37 UTC 2021


Does the script below look correct?
function slurm_job_submit(job_desc, part_list, submit_uid)

        if job_desc.partition == 'gpu' then
                     if  (job_desc.gres == nil) then
                              slurm.log_info("User did not specified
gres=gpu: ")
                              slurm.user_msg("You have to specify
gres=gpu:x  where x is number of GPUs.")
                              return slurm.ERROR
                     end
        end
end

*Fritz Ratnasamy*

Data Scientist

Information Technology

The University of Chicago

Booth School of Business

5807 S. Woodlawn

Chicago, Illinois 60637

Phone: +(1) 773-834-4556


On Mon, Sep 27, 2021 at 1:40 PM Renfro, Michael <Renfro at tntech.edu> wrote:

> Might need a restart of slurmctld at most, I expect.
>
>
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu>
> *Date: *Monday, September 27, 2021 at 12:32 PM
> *To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] EXTERNAL-Re: Block jobs on GPU partition
> when GPU is not specified
>
> *External Email Warning*
>
> *This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.*
> ------------------------------
>
> Hi Michael Renfro,
>
>
> Thanks for your reply. Based on your answers, would this work:
> 1/ a function job_submit.lua with the following contents (just need a
> function that errored when gres:gpu is not specified in srun or in sbatch):
>
> function slurm_job_submit(job_desc, part_list, submit_uid)
>
>         if job_desc.partition == 'gpu' then
>                      if  (job_desc.gres == nil) then
>                               slurm.log_info("User did not specified
> gres=gpu: ")
>                               slurm.user_msg("You have to specify
> gres=gpu:x  where x is number of GPUs.")
>                               return slurm.ERROR
>                      end
>         end
> end
>
>
>
>
>
> 4/  I found out a file  the file job_submit_lua.so in our controller in
> /lib64/slurm/ and also the lua lib seems to be installed:
>  sudo rpm -qa | grep lua
>
> lua-5.3.4-11.el8.x86_64
> lua-libs-5.3.4-11.el8.x86_64
> lua-devel-5.3.4-11.el8.x86_64
>
>
>
>  so I guess for now I just need to create job_submit.lua, uncomment the
> job plugin in slurm.conf/ is there any Slurm service to restart after that?
>
> Thanks again
>
> *Fritz Ratnasamy*
>
> Data Scientist
>
> Information Technology
>
> The University of Chicago
>
> Booth School of Business
>
> 5807 S. Woodlawn
>
> Chicago, Illinois 60637
>
> Phone: +(1) 773-834-4556
>
>
>
>
>
> On Sat, Sep 25, 2021 at 11:08 AM Renfro, Michael <Renfro at tntech.edu>
> wrote:
>
> If you haven't already seen it there's an example Lua script from SchedMD
> at [1], and I've got a copy of our local script at [2]. Otherwise, in the
> order you asked:
>
>
>
>    1. That seems reasonable, but our script just checks if there's a gres
>    at all. I don't *think* any gres other than gres=gpu would let the job run,
>    since our GPU nodes only have Gres=gpu:2 entries. Same thing for asking for
>    more GPUs than are in the node: if someone asked for gres=gpu:3 or higher,
>    the job would get blocked.
>
>    The above might be an annoyance to your users if their job just sits
>    in the queue with no other notice, but it hasn't really been an issue here.
>    The big benefit from your side would be that you could simplify the if
>    statement down to something like 'if (job_desc.gres ~= nil)'.
>    2. yes, uncomment JobSubmitPlugins=lua
>    3. Far as I know, if you uncomment the JobSubmitPlugin line and have a
>    job_submit.lua file in the same folder as your slurm.conf, the Lua script
>    should get executed automatically.
>    4. Our RPM installations of Slurm contained the job_submit_lua.so,
>    both for Bright 8 and for OpenHPC.
>
>
>
> [1]
> https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSchedMD%2Fslurm%2Fblob%2Fmaster%2Fcontribs%2Flua%2Fjob_submit.lua&data=04%7C01%7Crenfro%40tntech.edu%7C886c88a5003e47f499a708d981dcc1ca%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637683607421596971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Stz6nQQcXulKbCkYJ%2Bza8ki4%2FinuQ260y4fjiBfjo%2F0%3D&reserved=0>
>
> [2] https://gist.github.com/mikerenfro/df89fac5052a45cc2c1651b9a30978e0
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fmikerenfro%2Fdf89fac5052a45cc2c1651b9a30978e0&data=04%7C01%7Crenfro%40tntech.edu%7C886c88a5003e47f499a708d981dcc1ca%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637683607421606966%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8ArhZQt1mUCFs%2FRLm%2FokvJ0vVpKQdPz1mgtwEOErH0Y%3D&reserved=0>
>
>
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu>
> *Date: *Saturday, September 25, 2021 at 12:23 AM
> *To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject: *[slurm-users] Block jobs on GPU partition when GPU is not
> specified
>
> *External Email Warning*
>
> *This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.*
> ------------------------------
>
> Hi,
>
> I would like to block jobs submitted in our GPU partition when gres=gpu:1
> (or any number between 1 and 4) is not specified when submitting a job
> through sbatch or requesting an interactive session with srun.
>
> Currently, /etc/slurm/slurm.conf has JobSumitPlugins=lua commented.
> The liblua.so is now installed.
>
> I would like to use something similar as the example mentioned at the end
> of the page:
> https://slurm.schedmd.com/resource_limits.html
>
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fresource_limits.html%250b&data=04%7C01%7Crenfro%40tntech.edu%7C886c88a5003e47f499a708d981dcc1ca%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637683607421616958%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WuTjGCmcssAbQwyV5iADgH%2BQVejgfKLQ1QMleIjTdNE%3D&reserved=0>Can
> I use the following code :
>
> function slurm_job_submit(job_desc, part_list, submit_uid)
>
>    if (job_desc.gres ~= nil)
>
>    then
>
>       for g in job_desc.gres:gmatch("[^,]+")
>
>       do
>
>          bad = string.match(g,'^gpu[:]*[0-9]*$')
>
>          if (bad ~= nil)
>
>          then
>
>             slurm.log_info("User specified gpu GRES without type: %s", bad)
>
>             slurm.user_msg("You must always specify a type when requesting gpu GRES")
>
>             return slurm.ERROR
>
>          end
>
>       end
>
>    end
>
> end
>
> I do not need to check if the model is specified though. In that case,
>
> 1/ Should I change the line bad = string.match(g,'^gpu[:]*[0-9]*$') to
> string.match(g,'^gpu[:]*[0-9]')
>
> 2/ Do I need to uncomment  JobSumitPlugins=lua
>
> 3/ Where to specify the function call slurm_job_submit so I make sure the
> check to see if gres=gpu:1 is happening?
> 4/ I would need job_submit_lua.so, where can I find that library and if it
> is not there, how can i dowload it?
>
> Thanks for your help. I am new to regular expressions, lua and Slurm so I
> apologize if my questions do not make sense.
>
>
> *Fritz Ratnasamy*
>
> Data Scientist
>
> Information Technology
>
> The University of Chicago
>
> Booth School of Business
>
> 5807 S. Woodlawn
>
> Chicago, Illinois 60637
>
> Phone: +(1) 773-834-4556
>
> CAUTION: This email has originated outside of University email systems.
> Please do not click links or open attachments unless you recognize the
> sender and trust the contents as safe.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210927/38382886/attachment-0001.htm>


More information about the slurm-users mailing list