[slurm-users] EXTERNAL-Re: Block jobs on GPU partition when GPU is not specified
Ratnasamy, Fritz
fritz.ratnasamy at chicagobooth.edu
Mon Sep 27 17:31:13 UTC 2021
Hi Michael Renfro,
Thanks for your reply. Based on your answers, would this work:
1/ a function job_submit.lua with the following contents (just need a
function that errored when gres:gpu is not specified in srun or in sbatch):
function slurm_job_submit(job_desc, part_list, submit_uid)
if job_desc.partition == 'gpu' then
if (job_desc.gres == nil) then
slurm.log_info("User did not specified
gres=gpu: ")
slurm.user_msg("You have to specify
gres=gpu:x where x is number of GPUs.")
return slurm.ERROR
end
end
end
4/ I found out a file the file job_submit_lua.so in our controller in
/lib64/slurm/ and also the lua lib seems to be installed:
sudo rpm -qa | grep lua
lua-5.3.4-11.el8.x86_64
lua-libs-5.3.4-11.el8.x86_64
lua-devel-5.3.4-11.el8.x86_64
so I guess for now I just need to create job_submit.lua, uncomment the job
plugin in slurm.conf/ is there any Slurm service to restart after that?
Thanks again
*Fritz Ratnasamy*
Data Scientist
Information Technology
The University of Chicago
Booth School of Business
5807 S. Woodlawn
Chicago, Illinois 60637
Phone: +(1) 773-834-4556
On Sat, Sep 25, 2021 at 11:08 AM Renfro, Michael <Renfro at tntech.edu> wrote:
> If you haven't already seen it there's an example Lua script from SchedMD
> at [1], and I've got a copy of our local script at [2]. Otherwise, in the
> order you asked:
>
>
>
> 1. That seems reasonable, but our script just checks if there's a gres
> at all. I don't *think* any gres other than gres=gpu would let the job run,
> since our GPU nodes only have Gres=gpu:2 entries. Same thing for asking for
> more GPUs than are in the node: if someone asked for gres=gpu:3 or higher,
> the job would get blocked.
>
> The above might be an annoyance to your users if their job just sits
> in the queue with no other notice, but it hasn't really been an issue here.
> The big benefit from your side would be that you could simplify the if
> statement down to something like 'if (job_desc.gres ~= nil)'.
>
> 2. yes, uncomment JobSubmitPlugins=lua
>
> 3. Far as I know, if you uncomment the JobSubmitPlugin line and have a
> job_submit.lua file in the same folder as your slurm.conf, the Lua script
> should get executed automatically.
>
> 4. Our RPM installations of Slurm contained the job_submit_lua.so,
> both for Bright 8 and for OpenHPC.
>
>
>
> [1]
> https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
>
> [2] https://gist.github.com/mikerenfro/df89fac5052a45cc2c1651b9a30978e0
>
>
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu>
> *Date: *Saturday, September 25, 2021 at 12:23 AM
> *To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject: *[slurm-users] Block jobs on GPU partition when GPU is not
> specified
>
> *External Email Warning*
>
> *This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.*
> ------------------------------
>
> Hi,
>
> I would like to block jobs submitted in our GPU partition when gres=gpu:1
> (or any number between 1 and 4) is not specified when submitting a job
> through sbatch or requesting an interactive session with srun.
>
> Currently, /etc/slurm/slurm.conf has JobSumitPlugins=lua commented.
> The liblua.so is now installed.
>
> I would like to use something similar as the example mentioned at the end
> of the page:
> https://slurm.schedmd.com/resource_limits.html
> <https://slurm.schedmd.com/resource_limits.html%0b>Can I use the
> following code :
>
> function slurm_job_submit(job_desc, part_list, submit_uid)
>
> if (job_desc.gres ~= nil)
>
> then
>
> for g in job_desc.gres:gmatch("[^,]+")
>
> do
>
> bad = string.match(g,'^gpu[:]*[0-9]*$')
>
> if (bad ~= nil)
>
> then
>
> slurm.log_info("User specified gpu GRES without type: %s", bad)
>
> slurm.user_msg("You must always specify a type when requesting gpu GRES")
>
> return slurm.ERROR
>
> end
>
> end
>
> end
>
> end
>
> I do not need to check if the model is specified though. In that case,
>
> 1/ Should I change the line bad = string.match(g,'^gpu[:]*[0-9]*$') to
> string.match(g,'^gpu[:]*[0-9]')
>
> 2/ Do I need to uncomment JobSumitPlugins=lua
>
> 3/ Where to specify the function call slurm_job_submit so I make sure the
> check to see if gres=gpu:1 is happening?
> 4/ I would need job_submit_lua.so, where can I find that library and if it
> is not there, how can i dowload it?
>
> Thanks for your help. I am new to regular expressions, lua and Slurm so I
> apologize if my questions do not make sense.
>
>
> *Fritz Ratnasamy*
>
> Data Scientist
>
> Information Technology
>
> The University of Chicago
>
> Booth School of Business
>
> 5807 S. Woodlawn
>
> Chicago, Illinois 60637
>
> Phone: +(1) 773-834-4556
>
> CAUTION: This email has originated outside of University email systems.
> Please do not click links or open attachments unless you recognize the
> sender and trust the contents as safe.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210927/85396cfc/attachment-0001.htm>
More information about the slurm-users
mailing list