[slurm-users] Elastic Compute

Felix Wolfheimer f.wolfheimer at googlemail.com
Sun Sep 9 13:35:40 MDT 2018


I'm using the SLURM Elastic Compute feature and it works great in
general. However, I noticed that there's a bit of inefficiency in the
decision about the number of nodes which SLURM creates. Let's say I've
the following configuration

NodeName=compute-[1-100] CPUs=10 State=CLOUD

and there are none of these nodes up and running. Let's further say
that I create 10 identical jobs and submit them at the same time using

sbatch --nodes=1 --ntasks-per-node=1

I expected that SLURM finds out that 10 CPUs are required in total to
serve the requirements for all jobs and, thus, creates a single compute
node. However, SLURM triggers the creation of one node per job, i.e.,
10 nodes are created. When the first of these ten nodes is ready to
accept jobs, SLURM assigns all of the 10 submitted jobs to this single
node, though. The other nine nodes which were created are running idle
and are terminated again after a while. 

I'm using "SelectType=select/cons_res" to schedule on the CPU level. Is
there some knob which influences this behavior or is this behavior
hard-coded?



More information about the slurm-users mailing list