<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">If you check the source code (src/slurmctld/job_mgr.c) this error is indeed thrown when slurmctl unpacks job state files.  Tracing through read_slurm_conf() -> load_all_job_state() -> _load_job_state():<div class=""><br class=""></div><div class=""><br class=""></div><div class=""><span class="Apple-tab-span" style="white-space:pre">           </span>part_ptr = find_part_record (partition);<br class=""><span class="Apple-tab-span" style="white-space:pre">               </span>if (part_ptr == NULL) {<br class=""><span class="Apple-tab-span" style="white-space:pre">                        </span>char *err_part = NULL;<br class=""><span class="Apple-tab-span" style="white-space:pre">                 </span>part_ptr_list = get_part_list(partition, &err_part);<br class=""><span class="Apple-tab-span" style="white-space:pre">                       </span>if (part_ptr_list) {<br class=""><span class="Apple-tab-span" style="white-space:pre">                           </span>part_ptr = list_peek(part_ptr_list);<br class=""><span class="Apple-tab-span" style="white-space:pre">                           </span>if (list_count(part_ptr_list) == 1)<br class=""><span class="Apple-tab-span" style="white-space:pre">                                    </span>FREE_NULL_LIST(part_ptr_list);<br class=""><span class="Apple-tab-span" style="white-space:pre">                 </span>} else {<br class=""><b class=""><span class="Apple-tab-span" style="white-space:pre">                           </span>verbose("Invalid partition (%s) for JobId=%u",<br class=""><span class="Apple-tab-span" style="white-space:pre">                                       </span>err_part, job_id);<br class=""><span class="Apple-tab-span" style="white-space:pre">                             </span>xfree(err_part);<br class=""><span class="Apple-tab-span" style="white-space:pre">                               </span>/* not fatal error, partition could have been<br class=""><span class="Apple-tab-span" style="white-space:pre">                          </span> * removed, reset_job_bitmaps() will clean-up<br class=""><span class="Apple-tab-span" style="white-space:pre">                             </span> * this job */</b><br class=""><span class="Apple-tab-span" style="white-space:pre">                  </span>}<br class=""><span class="Apple-tab-span" style="white-space:pre">              </span>}</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">The comment after the error implies that this is not really a problem, and that it occurs specifically when a partition has been removed.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class="">On Jul 26, 2019, at 11:15 AM, Brian Andrus <<a href="mailto:toomuchit@gmail.com" class="">toomuchit@gmail.com</a>> wrote:<br class=""><br class="">All,<br class=""><br class="">I have a cloud based cluster using slurm 19.05.0-1<br class="">I removed one of the partitions, but now everytime I start slurmctld I get some errors:<br class=""><br class="">slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545<br class="">slurmctld[63042]: error: _find_node_record(756): lookup failure for mpi-h44rs-01<br class="">slurmctld[63042]: error: node_name2bitmap: invalid node specified mpi-h44rs-01<br class="">.<br class="">.<br class="">slurmctld[63042]: error: _find_node_record(756): lookup failure for mpi-h44rs-05<br class="">slurmctld[63042]: error: node_name2bitmap: invalid node specified mpi-h44rs-05<br class="">slurmctld[63042]: error: Invalid nodes (mpi-h44rs-[01-05]) for JobId=52545<br class=""><br class="">I suspect this is in the saved state directory and if I were to down the entire cluster and delete those files up, it would clear it up, but I prefer to not have to down the cluster...<br class=""><br class="">Is there a way to clean up "phantom" nodes and partitions that were deleted?<br class=""><br class="">Brian Andrus <br class=""></blockquote><br class=""><div class=""><br class="">::::::::::::::::::::::::::::::::::::::::::::::::::::::<br class="">Jeffrey T. Frey, Ph.D.<br class="">Systems Programmer V / HPC Management<br class="">Network & Systems Services / College of Engineering<br class="">University of Delaware, Newark DE  19716<br class="">Office: (302) 831-6034  Mobile: (302) 419-4976<br class="">::::::::::::::::::::::::::::::::::::::::::::::::::::::<br class=""><br class=""><br class=""><br class=""></div><br class=""></div></body></html>