[slurm-users] Controlling access to idle nodes
Jason Simms
simmsj at lafayette.edu
Tue Oct 6 15:37:54 UTC 2020
Hello David,
I'm still relatively new at Slurm, but one way we handle this is that for
users/groups who have "bought in" to the cluster, we use a QOS to provide
them preemptible access to the set of resources provided by, e.g., a set
number of nodes, but not the nodes themselves. That is, in one example, two
researchers each have priority preemptible access to up to 52 cores in the
cluster, but those cores can come from any physical node. I set the
priority of the QOS for each researcher equal, such that they cannot
preempt each other.
Admittedly, this works best and most simply in a situation where your nodes
are relatively homogeneous, as ours currently are. I am trying to avoid a
situation where a given physical node is restricted to a specific
researcher/group, as I want all nodes, as much as possible, to be available
to all users, precisely so that idle cycles don't go to waste. It aligns
with the general philosophy that nodes are more like cattle and less like
pets, in my opinion, so I try to treat them like a giant shared pool rather
than multiple independent, gated systems.
Anyway, I suspect other users here with more experience might have a
different, or better, approach and I look forward to hearing their thoughts
as well.
Warmest regards,
Jason
On Tue, Oct 6, 2020 at 11:12 AM David Baker <D.J.Baker at soton.ac.uk> wrote:
> Hello,
>
> I would appreciate your advice on how to deal with this situation in
> Slurm, please. If I have a set of nodes used by 2 groups, and normally each
> group would each have access to half the nodes. So, I could limit each
> group to have access to 3 nodes each, for example. I am trying to devise a
> scheme that allows each group to make best use of the node always. In other
> words, each group could potentially use all the nodes (assuming they all
> free and the other group isn't using the nodes at all).
>
> I cannot set hard and soft limits in slurm, and so I'm not sure how to
> make the situation flexible. Ideally It would be good for each group to be
> able to use their allocation and then take advantage of any idle nodes via
> a scavenging mechanism. The other group could then pre-empt the scavenger
> jobs and claim their nodes. I'm struggling with this since this seems like
> a two-way scavenger situation.
>
> Could anyone please help? I have, by the way, set up partition-based
> pre-emption in the cluster. This allows the general public to scavenge
> nodes owned by research groups.
>
> Best regards,
> David
>
>
>
--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201006/eda47dae/attachment.htm>
More information about the slurm-users
mailing list