[slurm-users] Errors after removing partition

Brian Andrus toomuchit at gmail.com
Sat Jul 27 15:56:48 UTC 2019


The jobs themselves no longer exist. They had completed before I deleted 
the partition, which is odd to me.

I may have did 'reconfigure' before restarting slurmctld, it was awhile 
ago, so I don't recall.

Brian Andrus


On 7/26/2019 8:10 PM, Chris Samuel wrote:
> On 26/7/19 8:28 am, Jeffrey Frey wrote:
>
>> If you check the source code (src/slurmctld/job_mgr.c) this error is 
>> indeed thrown when slurmctl unpacks job state files.  Tracing through 
>> read_slurm_conf() -> load_all_job_state() -> _load_job_state():
>
> I don't think that's the actual error that Brian is seeing, as that's 
> just a "verbose()" message (as are another 3 of the 5 instances of 
> this).  The only one that's actually an error is this one:
>
> https://github.com/SchedMD/slurm/blob/slurm-19.05/src/slurmctld/job_mgr.c#L11002 
>
>
> in this function:
>
>  * reset_job_bitmaps - reestablish bitmaps for existing jobs.
>  *    this should be called after rebuilding node information,
>  *    but before using any job entries.
>
> It looks like it should mark these jobs as failed, is that the case 
> Brian?
>
> Brian: when you removed the partition did you restart slurmctld or 
> just do an scontrol reconfigure?
>
> BTW that check was introduced in 2003 by Moe :-)
>
> https://github.com/SchedMD/slurm/commit/1c7ee080a48aa6338d3fc5480523017d4287dc08 
>
>
> All the best,
> Chris



More information about the slurm-users mailing list