[slurm-users] Simple free for all cluster
John H
jsh at SDF.ORG
Sat Oct 17 09:08:32 UTC 2020
Thanks Chris will likely need it :)
John
On Sat, Oct 10, 2020 at 04:19:06PM -0700, Chris Samuel wrote:
> On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote:
>
> > I currently don't have a MaxTime defined, because how do I know how long a
> > job will take? Most jobs on my cluster require no more than 3-4 days, but
> > in some cases at other campuses, I know that jobs can run for weeks. I
> > suppose even setting a time limit such as 4 weeks would be overkill, but at
> > least it's not infinite. I'm curious what others use as that value, and how
> > you arrived at it
>
> My journey over the last 16 years in HPC has been one of decreasing time
> limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we
> then introduced a 90 day limit so we could plan quarterly maintenances (and
> yes, we had users who had jobs which legitimately ran longer than that, so
> they had to learn to checkpoint). At VLSCI we had 30 day limits (life
> sciences, so many long running poorly scaling jobs), then when I was at
> Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits.
>
> It really is down to what your use cases are and how much influence you have
> over your users. It's often the HPC sysadmins responsibility to try and find
> that balance between good utilisation, effective use of the system and reaching
> the desired science/research/development outcomes.
>
> Best of luck!
> Chris
> --
> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
>
>
>
>
--
jsh at sdf.org
SDF Public Access UNIX System - http://sdf.org
More information about the slurm-users
mailing list