[slurm-users] Errors after removing partition

Chris Samuel chris at csamuel.org
Sat Jul 27 03:10:28 UTC 2019


On 26/7/19 8:28 am, Jeffrey Frey wrote:

> If you check the source code (src/slurmctld/job_mgr.c) this error is 
> indeed thrown when slurmctl unpacks job state files.  Tracing through 
> read_slurm_conf() -> load_all_job_state() -> _load_job_state():

I don't think that's the actual error that Brian is seeing, as that's 
just a "verbose()" message (as are another 3 of the 5 instances of 
this).  The only one that's actually an error is this one:

https://github.com/SchedMD/slurm/blob/slurm-19.05/src/slurmctld/job_mgr.c#L11002

in this function:

  * reset_job_bitmaps - reestablish bitmaps for existing jobs.
  *	this should be called after rebuilding node information,
  *	but before using any job entries.

It looks like it should mark these jobs as failed, is that the case Brian?

Brian: when you removed the partition did you restart slurmctld or just 
do an scontrol reconfigure?

BTW that check was introduced in 2003 by Moe :-)

https://github.com/SchedMD/slurm/commit/1c7ee080a48aa6338d3fc5480523017d4287dc08

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list