<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body>

<div dir="auto">I believe the default value of this would prevent jobs from sharing a node.  You may want to look at this and change it from the default.<br>

<br>

<div>--<br>

Brian D. Haymore<br>

University of Utah<br>

Center for High Performance Computing<br>

155 South 1452 East RM 405<br>

Salt Lake City, Ut 84112<br>

Phone: 801-558-1150, Fax: 801-585-5366<br>

http://bit.ly/1HO1N2C</div>

</div>

<div class="gmail_extra"><br>

<div class="gmail_quote">On Sep 10, 2018 6:30 AM, Felix Wolfheimer <f.wolfheimer@googlemail.com> wrote:<br type="attribution">

</div>

</div>

<div>

<div dir="ltr">

<div dir="ltr">

<div dir="ltr">

<div dir="ltr">

<div>No this happens without the "Oversubscribe" parameter being set. I'm using custom resources though:</div>

<div><br>

</div>

<div>GresTypes=some_resource</div>

<div><br>

</div>

<div>NodeName=compute-[1-100] CPUs=10 Gres=some_resource:10 State=CLOUD</div>

<div><br>

</div>

<div>Submission uses:</div>

<div><br>

</div>

<div>sbatch --nodes=1 --ntasks-per-node=1 --gres=some_resource:1</div>

<div><br>

</div>

<div>But I just tried it without requesting this custom resource. It shows the same behavior, i.e., SLURM spins N nodes when I submit N jobs to the queue regardless what the resource request of each job is. 

<br>

</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

</div>

</div>

</div>

</div>

<br>

<div class="gmail_quote">

<div dir="ltr">Am Mo., 10. Sep. 2018 um 03:55 Uhr schrieb Brian Haymore <<a href="mailto:brian.haymore@utah.edu">brian.haymore@utah.edu</a>>:<br>

</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">

What do you have the OverSubscribe parameter set on the partition your using?<br>

<br>

<br>

--<br>

Brian D. Haymore<br>

University of Utah<br>

Center for High Performance Computing<br>

155 South 1452 East RM 405<br>

Salt Lake City, Ut 84112<br>

Phone: 801-558-1150, Fax: 801-585-5366<br>

<a href="http://bit.ly/1HO1N2C" rel="noreferrer" target="_blank">http://bit.ly/1HO1N2C</a><br>

<br>

________________________________________<br>

From: slurm-users [<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>] on behalf of Felix Wolfheimer [<a href="mailto:f.wolfheimer@googlemail.com" target="_blank">f.wolfheimer@googlemail.com</a>]<br>

Sent: Sunday, September 09, 2018 1:35 PM<br>

To: <a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a><br>

Subject: [slurm-users] Elastic Compute<br>

<br>

I'm using the SLURM Elastic Compute feature and it works great in<br>

general. However, I noticed that there's a bit of inefficiency in the<br>

decision about the number of nodes which SLURM creates. Let's say I've<br>

the following configuration<br>

<br>

NodeName=compute-[1-100] CPUs=10 State=CLOUD<br>

<br>

and there are none of these nodes up and running. Let's further say<br>

that I create 10 identical jobs and submit them at the same time using<br>

<br>

sbatch --nodes=1 --ntasks-per-node=1<br>

<br>

I expected that SLURM finds out that 10 CPUs are required in total to<br>

serve the requirements for all jobs and, thus, creates a single compute<br>

node. However, SLURM triggers the creation of one node per job, i.e.,<br>

10 nodes are created. When the first of these ten nodes is ready to<br>

accept jobs, SLURM assigns all of the 10 submitted jobs to this single<br>

node, though. The other nine nodes which were created are running idle<br>

and are terminated again after a while.<br>

<br>

I'm using "SelectType=select/cons_res" to schedule on the CPU level. Is<br>

there some knob which influences this behavior or is this behavior<br>

hard-coded?<br>

<br>

<br>

</blockquote>

</div>

</div>

</body>

</html>