[slurm-users] Controlling access to idle nodes

David Baker D.J.Baker at soton.ac.uk
Thu Oct 8 07:54:43 UTC 2020


Thank you very much for your comments. Oddly enough, I came up with the 3-partition model as well once I'd sent my email. So, your comments helped to confirm that I was thinking on the right lines.

Best regards,
David

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Thomas M. Payerle <payerle at umd.edu>
Sent: 06 October 2020 18:50
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Controlling access to idle nodes

We use a scavenger partition, and although we do not have the policy you describe, it could be used in your case.

Assume you have 6 nodes (node-[0-5]) and two groups A and B.
Create partitions
partA = node-[0-2]
partB = node-[3-5]
all = node-[0-6]

Create QoSes normal and scavenger.
Allow normal QoS to preempt jobs with scavenger QoS

In sacctmgr, give members of group A access to use partA with normal QoS  and group B access to use partB with normal QoS
Allow both A and B to use part all with scavenger QoS.

So members of A can launch jobs on partA with normal QoS (probably want to make that their default), and similarly member of B can launch jobs on partB with normal QoS.
But membes of A can also launch jobs on partB with scavenger QoS and vica versa.  If the partB nodes used by A are needed by B, they will get preempted.

This is not automatic (users need to explicitly say they want to run jobs on the other half of the cluster), but that is probably reasonable because there are some jobs one does not wish to get preempted even if they have to wait a while in the queue to ensure such.

On Tue, Oct 6, 2020 at 11:12 AM David Baker <D.J.Baker at soton.ac.uk<mailto:D.J.Baker at soton.ac.uk>> wrote:
Hello,

I would appreciate your advice on how to deal with this situation in Slurm, please. If I have a set of nodes used by 2 groups, and normally each group would each have access to half the nodes. So, I could limit each group to have access to 3 nodes each, for example. I am trying to devise a scheme that allows each group to make best use of the node always. In other words, each group could potentially use all the nodes (assuming they all free and the other group isn't using the nodes at all).

I cannot set hard and soft limits in slurm, and so I'm not sure how to make the situation flexible. Ideally It would be good for each group to be able to use their allocation and then take advantage of any idle nodes via a scavenging mechanism. The other group could then pre-empt the scavenger jobs and claim their nodes. I'm struggling with this since this seems like a two-way scavenger situation.

Could anyone please help? I have, by the way, set up partition-based pre-emption in the cluster. This allows the general public to scavenge nodes owned by research groups.

Best regards,
David




--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu<mailto:payerle at umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201008/963c146b/attachment-0001.htm>


More information about the slurm-users mailing list