[slurm-users] Managing partition resources

William Brown william at signalbox.org.uk
Wed Aug 31 18:10:41 UTC 2022


This is intentional behaviour so that the maximum resources are kept free,
in case a large job of some kind is queued.

In principle each job on the first node is guaranteed the CPUs and memory
required, so they don't compete...much. The reality is that this is very
much affected by resources that cannot be exclusive such as IO to storage.

We have used the --spread-jobs option with some success but I think it
spreads the jobs of a single sbatch file rather than cause a new job to
scale horizontally.

I'm sure others know better.

William Brown

On Wed, 31 Aug 2022, 18:31 Alejandro Acuña, <
alejandro.acunia at iflp.unlp.edu.ar> wrote:

> Hi all.
> Under Slurm 19.05, is there a way to configure partition nodes to submit
> batch jobs indicating partition only? But...important detail: these job
> must start in node that is having less work of the partition.
> I hope you understand my drama. Under same partition, actually jobs run
> fine, but they accumulate in the first node and only new jobs use the
> second one when resources are not enough. Ideally, a job would start on a
> node with no job (in case there is).
> For the record: the only command I found and perform similar is:
> salloc --exclusive [file to submit]
> Whit salloc, if user submit same file many times, jobs are distributed in
> each node of the partition. But, this is not under batch plane.
>
> Thanks
> Ale
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220831/70a1413e/attachment.htm>


More information about the slurm-users mailing list