[slurm-users] Requirement to run longer jobs
andy.georges at ugent.be
Wed Jul 3 18:21:18 UTC 2019
On Wed, Jul 03, 2019 at 03:49:44PM +0000, David Baker wrote:
> A few of our users have asked about running longer jobs on our cluster. Currently our main/default compute partition has a time limit of 2.5 days. Potentially, a handful of users need jobs to run up to 5 hours. Rather than allow all users/jobs to have a run time limit of 5 days I wondered if the following scheme makes sense...
We have a similar issue, where default max walltime is 3 days, but due
to checkpointing not working properly atm, we have several high end
users asking for longer times.
> Increase the max run time on the default partition to be 5 days, however limit most users to a max of 2.5 days using the default "normal" QOS.
> Create a QOS called "long" with a max time limit of 5 days. Limit the user who can use "long". For authorized users assign "long" QOS to their jobs on basis of run time request.
> Does the above make sense or is it too complicated? If the above works could users limited to using the normal QOS have their running jobs run time increased to 5 days in exceptional circumstances?
Be aware that without restrictions, you users _will_ learn to take
advantage of the longer allowed walltime :)
We created a second partition that fully overlaps the default partition,
but with double the max wall time. Access to this partition is only
granted upon (motivated) request.
I have no idea if this make less or more sense that your proposal, to me
it's a different way to accomplish pretty much the same goal :)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 228 bytes
Desc: not available
More information about the slurm-users