[slurm-users] Creating groups of nodes with exclusive access to a resources within a partition.

Paul Brunk pbrunk at uga.edu
Thu Feb 10 02:48:03 UTC 2022


Hello Rich:

You could create partitions "bulk_a", "bulk_b", "bulk_c" (names are arbitrary) which map onto those three groups of nodes and have the intended resource limits set at partition level.  Then make job_submit lua cause all jobs submitted to "bulk" (or only the subset requesting a specific shared resource, or any subset you desire that job_submit.lua can detect) to also get submitted to the intended one or more of bulk_[abc].  I can imagine this meeting your need but am not certain it does.

Node features requested by jobs (keying off of them in lua filter, or adding them there) might help too.

--
Paul Brunk, system administrator
Georgia Advanced Resource Computing Center
Enterprise IT Svcs, the University of Georgia


On 2/1/22, 5:45 AM, "slurm-users" <slurm-users-bounces at lists.schedmd.com> wrote:
[EXTERNAL SENDER - PROCEED CAUTIOUSLY]

Hi,


I am wondering if this possible with slurm, I have an application where I want to create groups of  nodes (group size would be between 1 and n servers) which have exclusive access to a shared resources and then on that group of nodes allow a configurable amount of jobs to run.

For example I could have:

partition: bulk, containing:

group1, max 4 jobs:
  - node1
  - node2
  - node3
  - node4


group 2, max 2 jobs:
   - node5


group 3, max 1 job:

  - node6
  - node7
  - node8
  - node9



Ideally the user could submit a job to a generic queue and I could set a configurable gres/license in the background for them and the jobs get placed in a free group or pend if it requires the exclusive resource.

I've taken a look at:
1. Using the job submit lua plugin to look at the groups and if a group has available resources set a gres so the job is correctly placed.

2. Licenses, but I can't see how to limit a license to a group of hosts without creating clusters. Can you limit licenses to specific nodes?

3. On the scheduler, script building the node configuration and update the node gres and issue a 'scontrol reconfigure'




Option 3 works, but isn't great.


So I would really like the be able to use a plugin to look at the current allocation and set the a gres/license/partition for the user in the background, is it possible for the job_submit lua plugin to access an external resources or the license part of the slurm? As I could use that.

Or am I missing something or doing something very wrong.


Thanks in advance for any assistance its much appreciated.




Rich Cardwell
Snr IT Engineer
richc at graphcore.ai<mailto:richc at graphcore.ai>

www.graphcore.ai <http://www.graphcore.ai><http://www.graphcore.ai%3e>









** We have updated our privacy policy, which contains important information about how we collect and process your personal data. To read the policy, please click here <http://www.graphcore.ai/privacy><http://www.graphcore.ai/privacy%3e> **

This email and its attachments are intended solely for the addressed recipients and may contain confidential or legally privileged information.
If you are not the intended recipient you must not copy, distribute or disseminate this email in any way; to do so may be unlawful.

Any personal data/special category personal data herein are processed in accordance with UK data protection legislation.
All associated feasible security measures are in place. Further details are available from the Privacy Notice on the website and/or from the Company.

Graphcore Limited (registered in England and Wales with registration number 10185006) is registered at 107 Cheapside, London, UK, EC2V 6DN.
This message was scanned for viruses upon transmission. However Graphcore accepts no liability for any such transmission.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220210/b1e48971/attachment.htm>


More information about the slurm-users mailing list