[slurm-users] Single Node cluster. How to manage oversubscribing

Doug Meyer dameyer99 at gmail.com
Fri Feb 24 02:00:41 UTC 2023


Hi,

Did you configure your node definition with the outputs of slurmd -C?
Ignore boards.  Don't know if it is still true but several years ago
declaring boards made things difficult.

Also, if you have hyperthreaded AMD or Intel processors your partition
declaration should be overscribe:2

Start with a very simple job with a script containing sleep 100 or
something else without any runtime issues.

When I started with slurm I built the sbatch one small step at a time.
Nodes, cores. memory, partition, mail, etc

It sounds like your config is very close but your problem may be in the
submit script.

Best of luck and welcome to slurm. It is very powerful with a huge
community.

Doug



On Thu, Feb 23, 2023 at 6:58 AM Analabha Roy <hariseldon99 at gmail.com> wrote:

> Hi folks,
>
> I have a single-node "cluster" running Ubuntu 20.04 LTS with the
> distribution packages for slurm (slurm-wlm 19.05.5)
> Slurm only ran one job in the node at a time with the default
> configuration, leaving all other jobs pending.
> This happened even if that one job only requested like a few cores (the
> node has 64 cores, and slurm.conf is configged accordingly).
>
> in slurm conf, SelectType is set to select/cons_res, and
> SelectTypeParameters to CR_Core. NodeName is set with CPUs=64. Path to file
> is referenced below.
>
> So I set OverSubscribe=FORCE in the partition config and restarted the
> daemons.
>
> Multiple jobs are now run concurrently, but when Slurm is oversubscribed,
> it is *truly* *oversubscribed*. That is to say, it runs so many jobs that
> there are more processes running than cores/threads.
> How should I config slurm so that it runs multiple jobs at once per node,
> but ensures that it doesn't run more processes than there are cores? Is
> there some TRES magic for this that I can't seem to figure out?
>
> My slurm.conf is here on github:
> https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/slurm.conf
> The only gres I've set is for the GPU:
> https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/gres.conf
>
> Thanks for your attention,
> Regards,
> AR
> --
> Analabha Roy
> Assistant Professor
> Department of Physics
> <http://www.buruniv.ac.in/academics/department/physics>
> The University of Burdwan <http://www.buruniv.ac.in/>
> Golapbag Campus, Barddhaman 713104
> West Bengal, India
> Emails: daneel at utexas.edu, aroy at phys.buruniv.ac.in, hariseldon99 at gmail.com
> Webpage: http://www.ph.utexas.edu/~daneel/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230223/7f0d0add/attachment.htm>


More information about the slurm-users mailing list