[slurm-users] CR_Core_Memory behavior
jscoggins at lbl.gov
Wed Aug 26 00:47:06 UTC 2020
What is the variable for Oversubscribe is set for your partitions? By
default Oversubscribe=No which means that none of your Cores will be shared
with other jobs. With oversubscribe set to Yes or Force you should set a
number after the FORCE to allow the number of jobs that can run on each
core of each node in the partition.
Look at this page for a better understanding:
You can also check the oversubscribe on a partition using sinfo -o "%h"
sinfo -o '%P %.5a %.10h %N ' | head
PARTITION AVAIL OVERSUBSCR NODELIST
Look at the sinfo options for further details.
On Tue, Aug 25, 2020 at 9:58 AM Durai Arasan <arasan.durai at gmail.com> wrote:
> On our cluster we have SelectTypeParameters set to "CR_Core_Memory".
> Under these conditions multiple jobs should be able to run on the same
> node. But they refuse to be allocated on the same node and only one job
> runs on the node and rest of the jobs are in pending state.
> When we changed SelectTypeParameters to "CR_Core" however, this issue was
> resolved and multiple jobs were successfully allocated to the same node and
> ran concurrently on the same node.
> Does anyone know why such behavior is seen? Why does including memory as
> consumable resource lead to node exclusive behavior?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users