<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none"><!--P{margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>I have an application group that would improve throughput if we could configure jobs to run two on a node (but starting/finishing at individual job times) packed by the scheduler rather than spread out and overlapped only when the partition is fully loaded
with one job per node. The users' workflow is such that expecting individuals to do things like multiple srun inside the same batch script isn't going to work.<br>
</p>
<p><br>
</p>
<p>Currently the implementation of select/linear + OverSubscribe=force:2 first assigns out to all empty nodes round-robin, then starts doubling up.</p>
<p>Is there a script/plugin way to change this to first double up, then round robin the job assignment in the scheduler?</p>
<p><br>
</p>
<p>The use case in more detail:</p>
<p><br>
</p>
<p>PartitionName=batch Nodes=cluster[17-100] State=UP RootOnly=NO Default=YES MaxTime=2880 MaxNodes=60 DefaultTime=5 QoS=batch</p>
<p>PartitionName=long Nodes=cluster[37-100] State=UP RootOnly=NO Default=NO MaxTime=100000 MaxNodes=10 DefaultTime=5
</p>
<p><br>
</p>
<p>Users who want to run without manual restarts for a really long time can use partition 'long', but we don't want to round-robin fill the machine (note overlapping node set) with 'long' jobs before doubling the long jobs. The threading and memory behavior
of the application (large serial sections) makes this a reasonable policy. </p>
<p><br>
</p>
<p>Making the partition node lists non-overlapping leads to idleness in both batch and long.<br>
</p>
<p><br>
</p>
<p>What's the right path to achieve such a policy?<br>
</p>
<p>Ben<br>
</p>
</body>
</html>