[slurm-users] maintenance partitions?

Jeffrey Frey frey at udel.edu
Fri Oct 5 07:14:33 MDT 2018

You could reconfigure the partition node lists on the fly using scontrol:

$ scontrol update PartitionName=regular_part1 Nodes=<node list minus r00n00>
$ scontrol update PartitionName=regular_partN Nodes=<node list minus r00n00>
$ scontrol update PartitionName=maint Nodes=r00n00

Should be easy enough to write a script that find the partitions containing node X, remove it, then add to partition "maint."  The problem is restoring the node back to service, since you can't simply disable/down one particular node-in-a-partition.

> On Oct 5, 2018, at 9:06 AM, Michael Di Domenico <mdidomenico4 at gmail.com> wrote:
> Is anyone on the list using maintenance partitions for broken nodes?
> If so, how are you moving nodes between partitions?
> The situation with my machines at the moment, is that we have a steady
> stream of new jobs coming into the queues, but broken nodes as well.
> I'd like to fix those broken nodes and re-add them to a separate
> non-production pool so that user jobs don't match, but allow me to run
> maintenance jobs on the nodes to prove things are working before
> giving them back to the users
> if i simply mark nodes with downnodes= or scontrol update state=drain,
> slurm will prevent users from new jobs, but wont allow me to run jobs
> on the nodes
> Ideally, i'd like to have a prod partition and a maint partition,
> where the maint partition is set to exclusiveuser and i can set the
> status of a node in the prod partition to drain without affecting the
> node status in the maint partition.  I don't believe I can do this
> though.  I believe i have to change the slurm.conf and reconfigure to
> add/remove nodes from one partition or the other
> if anyone has a better solution, i'd like to hear it.

Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181005/d40eb169/attachment.html>

More information about the slurm-users mailing list