[slurm-users] Simple free for all cluster

John H jsh at SDF.ORG
Sat Oct 17 09:08:32 UTC 2020


Thanks Chris will likely need it :)

John

On Sat, Oct 10, 2020 at 04:19:06PM -0700, Chris Samuel wrote:
> On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote:
> 
> > I currently don't have a MaxTime defined, because how do I know how long a
> > job will take? Most jobs on my cluster require no more than 3-4 days, but
> > in some cases at other campuses, I know that jobs can run for weeks. I
> > suppose even setting a time limit such as 4 weeks would be overkill, but at
> > least it's not infinite. I'm curious what others use as that value, and how
> > you arrived at it
> 
> My journey over the last 16 years in HPC has been one of decreasing time 
> limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we 
> then introduced a 90 day limit so we could plan quarterly maintenances (and 
> yes, we had users who had jobs which legitimately ran longer than that, so 
> they had to learn to checkpoint).  At VLSCI we had 30 day limits (life 
> sciences, so many long running poorly scaling jobs), then when I was at 
> Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits.
> 
> It really is down to what your use cases are and how much influence you have 
> over your users.  It's often the HPC sysadmins responsibility to try and find 
> that balance between good utilisation, effective use of the system and reaching 
> the desired science/research/development outcomes.
> 
> Best of luck!
> Chris
> -- 
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> 
> 
> 
> 

-- 
jsh at sdf.org
SDF Public Access UNIX System - http://sdf.org



More information about the slurm-users mailing list