[slurm-users] Priority setup for some typical use cases?

Vang Le-Quy vle at its.aau.dk
Sat Sep 14 15:55:15 UTC 2019


Hello Slurm Users,

Our system does not allow much testing at the moment so I want to make use of community knowledge. The multifactor plugin has many handles to tweak. That makes it powerful and daunting at the same time. Basically how do you setup for various user groups based on urgency, resource usage, and resource guaranty? I am thinking of these groups:


  1.  daily users: medium requirements for RAM, CPU, GPU, storage. These can wait if the resources are busy. Their jobs may be even suspended/paused to give resources to other needs.
  2.  deadliners: Need constant and guaranteed access to resources; they cannot wait.
  3.  developers: run short and light jobs but require real time/near real time responsiveness.

In an ideal world, we may simply have dedicated nodes for these needs. However, if you can’t afford to have that many nodes. Can we mix these three needs on the same nodes?

For example, I have FrontEnd Node (FN) and two compute node N1, N2. This is the default partition:

PartitionName=daily Nodes=ALL Default=YES DefMemPerCPU=0 State=UP  OverSubscribe=NO  MaxTime=INFINITE SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE DefCpuPerGPU=2

How do I define a ‘developers’ partition for development that allows user to run temporary debug session with maximum walltime of 8:00:00 hours (MaxTime=8:00:00). Furthermore, jobs in this partition have highest priority, and are preferably started right away? Do I need also setup a ‘debug’ QOS as well? Last but not least, if the time is up for a job in this partition, can I set the job to be in suspended state.



[signature_752887066]
Vang Quy Le
Special Consultant in Data Science and Infrastructure

T: (+45) 9940 7710 | Email: vle at its.aau.dk<mailto:vle at its.aau.dk>
Kontor 0-1-91 | Selma Lagerløfs Vej 300 | DK-9220 Aalborg Ø |

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190914/111587fa/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 9007 bytes
Desc: image001.png
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190914/111587fa/attachment-0001.png>


More information about the slurm-users mailing list