[slurm-users] Errors after removing partition

Brian Andrus toomuchit at gmail.com
Fri Jul 26 15:15:30 UTC 2019


All,

I have a cloud based cluster using slurm 19.05.0-1
I removed one of the partitions, but now everytime I start slurmctld I get
some errors:

slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545
slurmctld[63042]: error: _find_node_record(756): lookup failure for
mpi-h44rs-01
slurmctld[63042]: error: node_name2bitmap: invalid node specified
mpi-h44rs-01
.
.
slurmctld[63042]: error: _find_node_record(756): lookup failure for
mpi-h44rs-05
slurmctld[63042]: error: node_name2bitmap: invalid node specified
mpi-h44rs-05
slurmctld[63042]: error: Invalid nodes (mpi-h44rs-[01-05]) for JobId=52545

I suspect this is in the saved state directory and if I were to down the
entire cluster and delete those files up, it would clear it up, but I
prefer to not have to down the cluster...

Is there a way to clean up "phantom" nodes and partitions that were deleted?

Brian Andrus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190726/56bc85ae/attachment.htm>


More information about the slurm-users mailing list