Hi Prentice,
Prentice Bisbal via slurm-users
slurm-users@lists.schedmd.com writes:
I think the idea of having a generous default timelimit is the wrong way to go. In fact, I think any defaults for jobs are a bad way to go. The majority of your users will just use that default time limit, and backfill scheduling will remain useless to you.
Horses for courses, I would say. We have a default time of 14 days, but because we also have QoS with increased priority, but shorter time limits, there is still an incentive for users to set the time limit themselves. So currently we have around 900 jobs running, only 100 of which are using the default time limit. Many of these will be long-running Gaussian jobs and will indeed need the time.
Instead, I recommend you use your job_submit.lua to reject all jobs that don't have a wallclock time and print out a helpful error message to inform users they now need to specify a wallclock time, and provide a link to documentation on how to do that.
Requiring users to specify a time limit themselves does two things:
- It reminds them that it's important to be conscious of timelimits when submitting jobs
This is a good point. We use 'jobstats', which provides information after a job has completed, about run time relative to time limit, amongst other things, although unfortunately many people don't seem to read this. However, even if you do force people to set a time limit, they can still choose not to think about it and just set the maximum.
- If a job is killed before it's done and all the progress is lost because the job wasn't checkpointing, they can't blame you as the admin.
I don't really understand this point. The limit is just the way it is, just as we have caps on the total number of cores or GPUs the jobs given user can use at any one time. Up to now no-one has blamed us for this.
If you do this, it's easy to get the users on board by first providing useful and usable documentation on why timelimits are needed and how to set them. Be sure to hammer home the point that effective timelimits can lead to their jobs running sooner, and that effective timelimits can increase cluster efficiency/utilization, helping them get a better return on their investment (if they contribute to the clusters cost) or they'll get more science done. I like to frame it that accurate wallclock times will give them a competitive edge in getting their jobs running before other cluster users. Everyone likes to think what they're doing will give them an advantage!
I agree with all this and this is also what we also try to do. The only thing I don't concur with is your last sentence. In my experience, as long as things work, users will in general not give a fig about whether they are using resources efficiently. Only when people notice a delay in jobs starting do they become more aware about it and are prepared to take action. It is particularly a problem with new users, because fairshare means that their jobs will start pretty quickly, no matter how inefficiently they have configured them. Maybe we should just give new users fewer share initially and only later bump them up to some standard value.
Cheers,
Loris
My 4 cents (adjusted for inflation).
Prentice
On 6/12/25 9:11 PM, Davide DelVento via slurm-users wrote:
Sounds good, thanks for confirming it. Let me sleep on it wrt the "too many" QOS, or think if I should ditch this idea. If I'll implement it, I'll post in this conversation details on how I did it. Cheers
On Thu, Jun 12, 2025 at 6:59 AM Ansgar Esztermann-Kirchner aeszter@mpinat.mpg.de wrote:
On Thu, Jun 12, 2025 at 04:52:24AM -0600, Davide DelVento wrote:
Hi Ansgar,
This is indeed what I was looking for: I was not aware of PreemptExemptTime.
From my cursory glance at the documentation, it seems that PreemptExemptTime is QOS-based and not job based though. Is that correct? Or could it be set per-job, perhaps on a prolog/submit lua script?
Yes, that's correct. I guess you could create a bunch of QOS with different PremptExemptTimes and then let the user select one (or indeed select it from lua) but as far as I know, there is no way to set arbitrary per-job values.
Best,
A.
Ansgar Esztermann Sysadmin Dep. Theoretical and Computational Biophysics https://www.mpinat.mpg.de/person/11315/3883774
-- Prentice Bisbal HPC Systems Engineer III Computational & Information Systems Laboratory (CISL) NSF National Center for Atmospheric Research (NSF NCAR) https://www.cisl.ucar.edu https://ncar.ucar.edu