We would like to put limits on interactive jobs (started by salloc) so that users don't leave unused interactive jobs behind on the cluster by mistake.
I can't offhand find any configurations that limit interactive jobs, such as enforcing a timelimit.
Perhaps this could be done in job_submit.lua, but I couldn't find any job_desc parameters in the source code which would indicate if a job is interactive or not.
Question: How do people limit interactive jobs, or identify orphaned jobs and kill them?
Thanks a lot, Ole
Hello Ole, the way I identify interactive jobs is by checking that the script is empty in job_submit.lua.
If it's the case then they're assigned to an interactive QoS that limits the time and resources as well as only allowing one job per user.
if job_desc.script == nil or job_desc.script == '' then
slurm.log_info("slurm_job_submit: jobscript is missing, assuming interactive job") slurm.log_user("Launching an interactive job")
if job_desc.partition == "gpu" then job_desc.qos = "gpu_interactive" end
if job_desc.partition == "cpu" then job_desc.qos = "cpu_interactive" end
return slurm.SUCCESS
Thanks
Ewan
-----Original Message----- From: Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com Sent: Freitag, 25. April 2025 11:15 To: slurm-users@schedmd.com Subject: [slurm-users] How can we put limits on interactive jobs?
We would like to put limits on interactive jobs (started by salloc) so that users don't leave unused interactive jobs behind on the cluster by mistake.
I can't offhand find any configurations that limit interactive jobs, such as enforcing a timelimit.
Perhaps this could be done in job_submit.lua, but I couldn't find any job_desc parameters in the source code which would indicate if a job is interactive or not.
Question: How do people limit interactive jobs, or identify orphaned jobs and kill them?
Thanks a lot, Ole
-- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hello,
we also do it this way, by checking if job_desc.script is empty. I have no idea if this is foolproof in any way (and use cases like, say, someone starting a Jupyter or RStudio instance via script are not covered), but hopefully, users who are inventive enough to find ways around this are also receptive enough to accept more reasonable and robust solutions for their workflows. Aside from setting a reasonable time limit, I'd say the most important limitation to steer users away from overusing interactive jobs is enforcing (either via partition or via QoS) that only one interactive job per user can be running at any given time.
Cheers, René
Am 25.04.25 um 11:37 schrieb Ewan Roche via slurm-users:
Hello Ole, the way I identify interactive jobs is by checking that the script is empty in job_submit.lua.
If it's the case then they're assigned to an interactive QoS that limits the time and resources as well as only allowing one job per user.
if job_desc.script == nil or job_desc.script == '' then slurm.log_info("slurm_job_submit: jobscript is missing, assuming interactive job") slurm.log_user("Launching an interactive job") if job_desc.partition == "gpu" then job_desc.qos = "gpu_interactive" end if job_desc.partition == "cpu" then job_desc.qos = "cpu_interactive" end return slurm.SUCCESS
Thanks
Ewan
-----Original Message----- From: Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com Sent: Freitag, 25. April 2025 11:15 To: slurm-users@schedmd.com Subject: [slurm-users] How can we put limits on interactive jobs?
We would like to put limits on interactive jobs (started by salloc) so that users don't leave unused interactive jobs behind on the cluster by mistake.
I can't offhand find any configurations that limit interactive jobs, such as enforcing a timelimit.
Perhaps this could be done in job_submit.lua, but I couldn't find any job_desc parameters in the source code which would indicate if a job is interactive or not.
Question: How do people limit interactive jobs, or identify orphaned jobs and kill them?
Thanks a lot, Ole
-- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hi all,
Thanks for the great suggestions! It seems that the Slurm job_submit.lua script is the most flexible way to check for interactive jobs, and change job parameters such as QOS, time_limit etc.
I've added this Lua function to our job_submit.lua script and it seems to work fine:
-- Check for interactive jobs -- Policy: Interactive jobs are limited to 4 hours function check_interactive_job (job_desc, submit_uid, log_prefix) if (job_desc.script == nil or job_desc.script == '') then local time_limit = 240 slurm.log_info("%s: user %s submitted an interactive job", log_prefix, userinfo) slurm.log_user("NOTICE: Job script is missing, assuming an interactive job") slurm.log_user(" Job timelimit is set to %d minutes", time_limit) job_desc.time_limit = time_limit end return slurm.SUCCESS end
The complete script is available at https://github.com/OleHolmNielsen/Slurm_tools/blob/master/plugins/job_submit...
Interestingly, Slurm by default (we're at 24.11.4) assigns job_desc.job_name="interactive" to interactive jobs submitted by salloc, from the manual page:
The default job name is the name of the "command" specified on the command line.
Users can of course override this with the --job-name parameter.
Best regards, Ole
On 4/25/25 11:37, ewan.roche@agroscope.admin.ch wrote:
Hello Ole, the way I identify interactive jobs is by checking that the script is empty in job_submit.lua.
If it's the case then they're assigned to an interactive QoS that limits the time and resources as well as only allowing one job per user.
if job_desc.script == nil or job_desc.script == '' then slurm.log_info("slurm_job_submit: jobscript is missing, assuming interactive job") slurm.log_user("Launching an interactive job") if job_desc.partition == "gpu" then job_desc.qos = "gpu_interactive" end if job_desc.partition == "cpu" then job_desc.qos = "cpu_interactive" end return slurm.SUCCESS
Thanks
Ewan
-----Original Message----- From: Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com Sent: Freitag, 25. April 2025 11:15 To: slurm-users@schedmd.com Subject: [slurm-users] How can we put limits on interactive jobs?
We would like to put limits on interactive jobs (started by salloc) so that users don't leave unused interactive jobs behind on the cluster by mistake.
I can't offhand find any configurations that limit interactive jobs, such as enforcing a timelimit.
Perhaps this could be done in job_submit.lua, but I couldn't find any job_desc parameters in the source code which would indicate if a job is interactive or not.
Question: How do people limit interactive jobs, or identify orphaned jobs and kill them?
Thanks a lot, Ole
-- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hi Ole,
Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com writes:
We would like to put limits on interactive jobs (started by salloc) so that users don't leave unused interactive jobs behind on the cluster by mistake.
I can't offhand find any configurations that limit interactive jobs, such as enforcing a timelimit.
Perhaps this could be done in job_submit.lua, but I couldn't find any job_desc parameters in the source code which would indicate if a job is interactive or not.
Question: How do people limit interactive jobs, or identify orphaned jobs and kill them?
We would be interested in this too.
Currently we have a very make-shift solution which involves a script which simply pipes all running job IDs to 'sjeff' (https://github.com/ubccr/stubl/blob/master/bin/sjeff) every 30s. This produces an output like the following:
Username Mem_Request Max_Mem_Use CPU_Efficiency Number_of_CPUs_In_Use able 3600M 0.94Gn 99.22% (142.88 of 144) baker 8G 0.90Gn 0.60% (0.02 of 4) charlie varied 32.92Gn 42.54% (5.96 of 14) ... == CPU efficiency: data above from Fri 25 Apr 11:17:09 CEST 2025 ==
where efficiencies under 50% are printed in red. As long as one only has about a screenful of users, it is fairly easy to spot users with a low CPU efficiency, whether it be due to idle interactive jobs or caused by something else.
Apart from that, we have a partition called 'interactive' which has an appropriately short MaxTime. We don't actually lie to our users by saying that they have to used this partition, but we don't advertise the fact they could use any of the other partitions for interactive work. This is obviously also even more make-shift :-)
Cheers,
Loris
Thanks a lot, Ole
-- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark