[slurm-users] requesting entire vs. partial nodes

Noam Bernstein noam.bernstein at nrl.navy.mil
Tue Oct 23 15:35:03 MDT 2018

> On Oct 20, 2018, at 3:06 AM, Chris Samuel <chris at csamuel.org> wrote:
> On Saturday, 20 October 2018 9:57:16 AM AEDT Noam Bernstein wrote:
>> If not, is there another way to do this?
> You can use --exclusive for jobs that want whole nodes.
> You will likely also want to use:
> SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
> to ensure jobs are given one core (with all its associated threads) per task.
> Also set DefMemPerCPU so that jobs get allocated a default amount of RAM per 
> core if they forget to ask for it.

Thanks for the suggestions.  I’ve tried this now, and even when I don’t set —exclusive jobs seem to refuse to run on nodes that already have jobs on them.  I’m using
Partitions don’t have OverSubscribe set explicitly, but my understanding is that since I’m using CR_Core_Memory it should still allow for sharing nodes, just not cores or memory    Nevertheless, when I submit N+1 jobs  (I have N nodes) each of which requesting half as many tasks as a node has cores, the N+1st job remains pending with reason Resources. Turning on OverSubscribe and adding —oversubscribe to the sbatch options doesn’t change anything either.  

 Is there any way to explicitly find out which resources are the limiting ones?

>> And however we achieve this, how does slurm decide what order to assign
>> nodes to jobs in the presence of jobs that don't take entire nodes.  If we
>> have a 2 16 core nodes and two 8 task jobs, are they going to be packed
>> into a single node, or each on its own node (leaving no free node for
>> another 16 task job that requires an entire node)?
> As long as you don't use CR_LLN (least loaded node) as your select parameter 
> and you don't use pack_serial_at_end in SchedulerParameters then Slurm (I 
> believe) is meant to use a best fit algorithm.

Hmm - that’s not consistent with what I’m seeing either.  Each of the jobs I describe above, which asks for half as many tasks as there are cores, ends up on a separate node.  I’m not using CR_LLN, or pack_serial_at_end.   I guess it’s obvious this is the case given that my jobs refuse to share nodes.

Any ideas as to what might be happening?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181023/0153536e/attachment.html>

More information about the slurm-users mailing list