[slurm-users] Single Node cluster. How to manage oversubscribing

Thu Feb 23 13:56:11 UTC 2023

Hi folks,

I have a single-node "cluster" running Ubuntu 20.04 LTS with the
distribution packages for slurm (slurm-wlm 19.05.5)
Slurm only ran one job in the node at a time with the default
configuration, leaving all other jobs pending.
This happened even if that one job only requested like a few cores (the
node has 64 cores, and slurm.conf is configged accordingly).

in slurm conf, SelectType is set to select/cons_res, and
SelectTypeParameters to CR_Core. NodeName is set with CPUs=64. Path to file
is referenced below.

So I set OverSubscribe=FORCE in the partition config and restarted the
daemons.

Multiple jobs are now run concurrently, but when Slurm is oversubscribed,
it is *truly* *oversubscribed*. That is to say, it runs so many jobs that
there are more processes running than cores/threads.
How should I config slurm so that it runs multiple jobs at once per node,
but ensures that it doesn't run more processes than there are cores? Is
there some TRES magic for this that I can't seem to figure out?

My slurm.conf is here on github:
https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/slurm.conf
The only gres I've set is for the GPU:
https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/gres.conf

Thanks for your attention,
Regards,
AR
-- 
Analabha Roy
Assistant Professor
Department of Physics
<http://www.buruniv.ac.in/academics/department/physics>
The University of Burdwan <http://www.buruniv.ac.in/>
Golapbag Campus, Barddhaman 713104
West Bengal, India
Emails: daneel at utexas.edu, aroy at phys.buruniv.ac.in, hariseldon99 at gmail.com
Webpage: http://www.ph.utexas.edu/~daneel/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230223/c1463357/attachment-0001.htm>