[slurm-users] CR_Core_Memory behavior

Jacqueline Scoggins jscoggins at lbl.gov
Wed Aug 26 00:47:06 UTC 2020


What is the variable for Oversubscribe is set for your partitions? By
default Oversubscribe=No which means that none of your Cores will be shared
with other jobs.  With oversubscribe set to Yes or Force you should set a
number after the FORCE to allow the number of jobs that can run on each
core of each node in the partition.
Look at this page for a better understanding:
https://slurm.schedmd.com/cons_res_share.html#:~:text=OverSubscribe%3DYES-,By%20default%20same%20as%20OverSubscribe%3DNO.,the%20srun%20%2D%2Doversubscribe%20option.&text=Each%20core%20can%20be%20allocated,default%204%20jobs%20per%20core).&text=CPUs%20are%20allocated%20to%20jobs
.

You can also check the oversubscribe on a partition using sinfo -o "%h"
option.
sinfo -o '%P %.5a %.10h %N ' | head

PARTITION AVAIL OVERSUBSCR NODELIST


Look at the sinfo options for further details.


Jackie

On Tue, Aug 25, 2020 at 9:58 AM Durai Arasan <arasan.durai at gmail.com> wrote:

> Hello,
>
> On our cluster we have SelectTypeParameters set to "CR_Core_Memory".
>
> Under these conditions multiple jobs should be able to run on the same
> node. But they refuse to be allocated on the same node and only one job
> runs on the node and rest of the jobs are in pending state.
>
> When we changed SelectTypeParameters to "CR_Core" however, this issue was
> resolved and multiple jobs were successfully allocated to the same node and
> ran concurrently on the same node.
>
> Does anyone know why such behavior is seen? Why does including memory as
> consumable resource lead to node exclusive behavior?
>
> Thanks,
> Durai
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200825/e292834c/attachment-0001.htm>


More information about the slurm-users mailing list