[slurm-users] some way to make oversubscribe jobs packed before spread

Allan, Benjamin baallan at sandia.gov
Wed Aug 8 16:20:19 MDT 2018


I have an application group that would improve throughput if we could configure jobs to run two on a node (but starting/finishing at individual job times) packed by the scheduler rather than spread out and overlapped only when the partition is fully loaded with one job per node. The users' workflow is such that expecting individuals to do things like multiple srun inside the same batch script isn't going to work.


Currently the implementation of select/linear + OverSubscribe=force:2 first assigns out to all empty nodes round-robin, then starts doubling up.

Is there a script/plugin way to change this to first double up, then round robin the job assignment in the scheduler?


The use case in more detail:


PartitionName=batch   Nodes=cluster[17-100] State=UP RootOnly=NO Default=YES MaxTime=2880 MaxNodes=60  DefaultTime=5 QoS=batch

PartitionName=long  Nodes=cluster[37-100] State=UP RootOnly=NO Default=NO MaxTime=100000 MaxNodes=10  DefaultTime=5


Users who want to run without manual restarts for a really long time can use partition 'long', but we don't want to round-robin fill the machine (note overlapping node set) with 'long' jobs before doubling the long jobs. The threading and memory behavior of the application (large serial sections) makes this a reasonable policy.


Making the partition node lists  non-overlapping leads to idleness in both batch and long.


What's the right path to achieve such a policy?

Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180808/6e52b682/attachment.html>


More information about the slurm-users mailing list