[slurm-users] Simple free for all cluster
Chris Samuel
chris at csamuel.org
Sat Oct 10 23:19:06 UTC 2020
On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote:
> I currently don't have a MaxTime defined, because how do I know how long a
> job will take? Most jobs on my cluster require no more than 3-4 days, but
> in some cases at other campuses, I know that jobs can run for weeks. I
> suppose even setting a time limit such as 4 weeks would be overkill, but at
> least it's not infinite. I'm curious what others use as that value, and how
> you arrived at it
My journey over the last 16 years in HPC has been one of decreasing time
limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we
then introduced a 90 day limit so we could plan quarterly maintenances (and
yes, we had users who had jobs which legitimately ran longer than that, so
they had to learn to checkpoint). At VLSCI we had 30 day limits (life
sciences, so many long running poorly scaling jobs), then when I was at
Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits.
It really is down to what your use cases are and how much influence you have
over your users. It's often the HPC sysadmins responsibility to try and find
that balance between good utilisation, effective use of the system and reaching
the desired science/research/development outcomes.
Best of luck!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list