[slurm-users] maintenance partitions?
Michael Di Domenico
mdidomenico4 at gmail.com
Fri Oct 5 07:06:00 MDT 2018
Is anyone on the list using maintenance partitions for broken nodes?
If so, how are you moving nodes between partitions?
The situation with my machines at the moment, is that we have a steady
stream of new jobs coming into the queues, but broken nodes as well.
I'd like to fix those broken nodes and re-add them to a separate
non-production pool so that user jobs don't match, but allow me to run
maintenance jobs on the nodes to prove things are working before
giving them back to the users
if i simply mark nodes with downnodes= or scontrol update state=drain,
slurm will prevent users from new jobs, but wont allow me to run jobs
on the nodes
Ideally, i'd like to have a prod partition and a maint partition,
where the maint partition is set to exclusiveuser and i can set the
status of a node in the prod partition to drain without affecting the
node status in the maint partition. I don't believe I can do this
though. I believe i have to change the slurm.conf and reconfigure to
add/remove nodes from one partition or the other
if anyone has a better solution, i'd like to hear it.
More information about the slurm-users