[slurm-users] some way to make oversubscribe jobs packed before spread

Wed Aug 8 23:01:42 MDT 2018

One thing you could consider doing is setting a higher weight on the the
long nodes (cluster[37-100] in your example).  This would cause jobs
submitted to the batch partition to attempt to schedule on low weight nodes
first, then the higher weight nodes.  So "long" would only get used if a
job requested long, or if the nodes exclusively devoted to batch were full.
----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
Acting Group Lead, Computational Systems Group
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacobsen at lbl.gov

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________

On Wed, Aug 8, 2018 at 3:32 PM Allan, Benjamin <baallan at sandia.gov> wrote:

> I have an application group that would improve throughput if we could
> configure jobs to run two on a node (but starting/finishing at individual
> job times) packed by the scheduler rather than spread out and overlapped
> only when the partition is fully loaded with one job per node. The users'
> workflow is such that expecting individuals to do things like multiple srun
> inside the same batch script isn't going to work.
>
>
> Currently the implementation of select/linear + OverSubscribe=force:2
> first assigns out to all empty nodes round-robin, then starts doubling up.
>
> Is there a script/plugin way to change this to first double up, then round
> robin the job assignment in the scheduler?
>
>
> The use case in more detail:
>
>
> PartitionName=batch   Nodes=cluster[17-100] State=UP RootOnly=NO
> Default=YES MaxTime=2880 MaxNodes=60  DefaultTime=5 QoS=batch
>
> PartitionName=long  Nodes=cluster[37-100] State=UP RootOnly=NO Default=NO
> MaxTime=100000 MaxNodes=10  DefaultTime=5
>
>
> Users who want to run without manual restarts for a really long time can
> use partition 'long', but we don't want to round-robin fill the machine
> (note overlapping node set) with 'long' jobs before doubling the long jobs.
> The threading and memory behavior of the application (large serial
> sections) makes this a reasonable policy.
>
>
> Making the partition node lists  non-overlapping leads to idleness in both
> batch and long.
>
>
> What's the right path to achieve such a policy?
>
> Ben
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180808/07c79a2a/attachment-0002.html>