[slurm-users] maintenance partitions?
greg.wickham at kaust.edu.sa
Fri Oct 5 07:40:01 MDT 2018
We use “maintenance” reservations to prevent nodes from receiving production jobs.
Create a reservation with “flags=maint” and it will override other reservations
(if they exist).
> On 5 Oct 2018, at 4:06 PM, Michael Di Domenico <mdidomenico4 at gmail.com> wrote:
> Is anyone on the list using maintenance partitions for broken nodes?
> If so, how are you moving nodes between partitions?
> The situation with my machines at the moment, is that we have a steady
> stream of new jobs coming into the queues, but broken nodes as well.
> I'd like to fix those broken nodes and re-add them to a separate
> non-production pool so that user jobs don't match, but allow me to run
> maintenance jobs on the nodes to prove things are working before
> giving them back to the users
> if i simply mark nodes with downnodes= or scontrol update state=drain,
> slurm will prevent users from new jobs, but wont allow me to run jobs
> on the nodes
> Ideally, i'd like to have a prod partition and a maint partition,
> where the maint partition is set to exclusiveuser and i can set the
> status of a node in the prod partition to drain without affecting the
> node status in the maint partition. I don't believe I can do this
> though. I believe i have to change the slurm.conf and reconfigure to
> add/remove nodes from one partition or the other
> if anyone has a better solution, i'd like to hear it.
Dr. Greg Wickham
Advanced Computing Infrastructure Team Lead
Advanced Computing Core Laboratory
King Abdullah University of Science and Technology
Building #1, Office #0124
greg.wickham at kaust.edu.sa +966 544 700 330
This message and its contents including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
More information about the slurm-users