[slurm-users] do oversubscription with algorithm other than least-loaded?

Herc Silverstein herc.silverstein at schrodinger.com
Tue Mar 8 04:29:06 UTC 2022


We'd like to have just one of the partitions over subscribe the nodes in 
it.  The nodes are not shared with any other partitions.

The SLURM documentation (https://slurm.schedmd.com/cons_res_share.html) 
seems to indicate that the least-loaded algorithm is always used when 
oversubscribe=force.  I believe oversubscribe=force is what we want (but 
have it packeach  node fully first).

Thanks for pointing out the -m option.  Our jobs are separately 
sbatched.  So, unfortunately, I don't see we can use it in this case.

What we want to be able to do is on, say, a 4 core node run 8 (or 12) 
jobs.  But only do it for the nodes in this one partition. The other 
partitions should continue to run N jobs on an N core node.


> <html style="direction: ltr;">   <head> 
>     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 
>     <style id="bidiui-paragraph-margins" type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style> 
>   </head> 
>   <body bidimailui-charset-is-forced="true" style="direction: ltr;"> 
>     <p>I could be missing something here, but if you refer to the <b>SelectTypeParameters=cr_lln 
>       </b>you could just try cr_pack_nodes.</p> 
>     <p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes">https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes</a><br> 
>     </p>     <p><br>     </p> 
>     <p>If you want it on a per-partition configuration, I'm not sure 
>       that's possible, you might need to set a distribution (-m) in your 
>       job submit script/wrapper (E.g., -m block:*:*,pack)</p> 
>     <p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/sbatch.html#OPT_distribution">https://slurm.schedmd.com/sbatch.html#OPT_distribution</a><br> 
>     </p>     <p><br>     </p> 
>     <p>If you're referring to something else entirely, could you 
>       elaborate on the least-loaded configuration in your setup?</p> 
>     <p><br>     </p>     <p><br>       <b></b></p> 
>     <div class="moz-cite-prefix">On 24/02/2022 23:35:30, Herc 
>       Silverstein wrote:<br>     </div>     <blockquote type="cite" 
>       cite="mid:3145b0e8-6ae0-f233-5080-36cdbba6677c at schrodinger.com"> 
>       <meta http-equiv="content-type" content="text/html; charset=UTF-8"> 
>       <p>Hi,</p> 
>       <p>We would like to do over-subscription on a cluster that's 
>         running in the cloud.  The cluster dynamically spins up and down 
>         cpu nodes as needed.  What we see is that the least-loaded 
>         algorithm causes the maximum number of nodes specified in the 
>         partition to be spun up and each loaded with N jobs for the N 
>         cpu's in a node before it "doubles back" and starts 
>         over-subscribing.</p> 
>       <p>What we actually want is for the <i>minimum </i>number of 
>         nodes to be used and for it to fully load (to the limit of the 
>         oversubscription setting) one node before starting up another. 
>         That is, we really want a "most-loaded" algorithm.  This would 
>         allow us to reduce the number of nodes we need to run and reduce 
>         costs.</p> 
>       <p>Is there a way to get this behavior somehow?</p> 
>       <p>Herc</p>       <p><br>       </p>       <p><br>       </p> 
>     </blockquote>     <pre class="moz-signature" cols="72">-- Regards, 
> Daniel Letai +972 (0)505 870 456</pre>   </body> </html>

More information about the slurm-users mailing list