[slurm-users] EXTERNAL-Re: Block jobs on GPU partition when GPU is not specified
Renfro, Michael
Renfro at tntech.edu
Mon Sep 27 19:03:47 UTC 2021
On a quick read, it did look correct.
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu>
Date: Monday, September 27, 2021 at 1:59 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] EXTERNAL-Re: Block jobs on GPU partition when GPU is not specified
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
________________________________
Does the script below look correct?
function slurm_job_submit(job_desc, part_list, submit_uid)
if job_desc.partition == 'gpu' then
if (job_desc.gres == nil) then
slurm.log_info("User did not specified gres=gpu: ")
slurm.user_msg("You have to specify gres=gpu:x where x is number of GPUs.")
return slurm.ERROR
end
end
end
Fritz Ratnasamy
Data Scientist
Information Technology
The University of Chicago
Booth School of Business
5807 S. Woodlawn
Chicago, Illinois 60637
Phone: +(1) 773-834-4556
On Mon, Sep 27, 2021 at 1:40 PM Renfro, Michael <Renfro at tntech.edu<mailto:Renfro at tntech.edu>> wrote:
Might need a restart of slurmctld at most, I expect.
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu<mailto:fritz.ratnasamy at chicagobooth.edu>>
Date: Monday, September 27, 2021 at 12:32 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] EXTERNAL-Re: Block jobs on GPU partition when GPU is not specified
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
________________________________
Hi Michael Renfro,
Thanks for your reply. Based on your answers, would this work:
1/ a function job_submit.lua with the following contents (just need a function that errored when gres:gpu is not specified in srun or in sbatch):
function slurm_job_submit(job_desc, part_list, submit_uid)
if job_desc.partition == 'gpu' then
if (job_desc.gres == nil) then
slurm.log_info("User did not specified gres=gpu: ")
slurm.user_msg("You have to specify gres=gpu:x where x is number of GPUs.")
return slurm.ERROR
end
end
end
4/ I found out a file the file job_submit_lua.so in our controller in /lib64/slurm/ and also the lua lib seems to be installed:
sudo rpm -qa | grep lua
lua-5.3.4-11.el8.x86_64
lua-libs-5.3.4-11.el8.x86_64
lua-devel-5.3.4-11.el8.x86_64
so I guess for now I just need to create job_submit.lua, uncomment the job plugin in slurm.conf/ is there any Slurm service to restart after that?
Thanks again
Fritz Ratnasamy
Data Scientist
Information Technology
The University of Chicago
Booth School of Business
5807 S. Woodlawn
Chicago, Illinois 60637
Phone: +(1) 773-834-4556
On Sat, Sep 25, 2021 at 11:08 AM Renfro, Michael <Renfro at tntech.edu<mailto:Renfro at tntech.edu>> wrote:
If you haven't already seen it there's an example Lua script from SchedMD at [1], and I've got a copy of our local script at [2]. Otherwise, in the order you asked:
1. That seems reasonable, but our script just checks if there's a gres at all. I don't *think* any gres other than gres=gpu would let the job run, since our GPU nodes only have Gres=gpu:2 entries. Same thing for asking for more GPUs than are in the node: if someone asked for gres=gpu:3 or higher, the job would get blocked.
The above might be an annoyance to your users if their job just sits in the queue with no other notice, but it hasn't really been an issue here. The big benefit from your side would be that you could simplify the if statement down to something like 'if (job_desc.gres ~= nil)'.
2. yes, uncomment JobSubmitPlugins=lua
3. Far as I know, if you uncomment the JobSubmitPlugin line and have a job_submit.lua file in the same folder as your slurm.conf, the Lua script should get executed automatically.
4. Our RPM installations of Slurm contained the job_submit_lua.so, both for Bright 8 and for OpenHPC.
[1] https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSchedMD%2Fslurm%2Fblob%2Fmaster%2Fcontribs%2Flua%2Fjob_submit.lua&data=04%7C01%7Crenfro%40tntech.edu%7C82afe58bec4f4e9074d508d981e8e784%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637683659596087929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iaXsHeUHlDOr8vfcEB05EYlrmrozBeSOGiA8AUASCtw%3D&reserved=0>
[2] https://gist.github.com/mikerenfro/df89fac5052a45cc2c1651b9a30978e0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fmikerenfro%2Fdf89fac5052a45cc2c1651b9a30978e0&data=04%7C01%7Crenfro%40tntech.edu%7C82afe58bec4f4e9074d508d981e8e784%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637683659596087929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5aYetj4vWvfvml4bbcfSPuRs%2FNUTV1rnmfZajh2EXOE%3D&reserved=0>
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of Ratnasamy, Fritz <fritz.ratnasamy at chicagobooth.edu<mailto:fritz.ratnasamy at chicagobooth.edu>>
Date: Saturday, September 25, 2021 at 12:23 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: [slurm-users] Block jobs on GPU partition when GPU is not specified
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
________________________________
Hi,
I would like to block jobs submitted in our GPU partition when gres=gpu:1 (or any number between 1 and 4) is not specified when submitting a job through sbatch or requesting an interactive session with srun.
Currently, /etc/slurm/slurm.conf has JobSumitPlugins=lua commented.
The liblua.so is now installed.
I would like to use something similar as the example mentioned at the end of the page:
https://slurm.schedmd.com/resource_limits.html
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fresource_limits.html%250b&data=04%7C01%7Crenfro%40tntech.edu%7C82afe58bec4f4e9074d508d981e8e784%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637683659596097929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lQo4wHXSanS20PJkNl24pZDWNl25BbsJv30LggCBUSQ%3D&reserved=0>Can I use the following code :
function slurm_job_submit(job_desc, part_list, submit_uid)
if (job_desc.gres ~= nil)
then
for g in job_desc.gres:gmatch("[^,]+")
do
bad = string.match(g,'^gpu[:]*[0-9]*$')
if (bad ~= nil)
then
slurm.log_info("User specified gpu GRES without type: %s", bad)
slurm.user_msg("You must always specify a type when requesting gpu GRES")
return slurm.ERROR
end
end
end
end
I do not need to check if the model is specified though. In that case,
1/ Should I change the line bad = string.match(g,'^gpu[:]*[0-9]*$') to string.match(g,'^gpu[:]*[0-9]')
2/ Do I need to uncomment JobSumitPlugins=lua
3/ Where to specify the function call slurm_job_submit so I make sure the check to see if gres=gpu:1 is happening?
4/ I would need job_submit_lua.so, where can I find that library and if it is not there, how can i dowload it?
Thanks for your help. I am new to regular expressions, lua and Slurm so I apologize if my questions do not make sense.
Fritz Ratnasamy
Data Scientist
Information Technology
The University of Chicago
Booth School of Business
5807 S. Woodlawn
Chicago, Illinois 60637
Phone: +(1) 773-834-4556
CAUTION: This email has originated outside of University email systems. Please do not click links or open attachments unless you recognize the sender and trust the contents as safe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210927/531ed789/attachment-0001.htm>
More information about the slurm-users
mailing list