<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Just to hopefully close this out, I believe I was actually able to resolve this in “user-land” rather than mucking with the database.<div class=""><br class=""></div><div class="">I was able to requeue the bad jid’s, and they went pending.</div><div class="">Then I updated the jobs to a time limit of 60.</div><div class="">Then I scancelled the jobs, and they returned to a cancelled state, before they rolled off within about 10 minutes.</div><div class=""><br class=""></div><div class="">Surprised I didn’t think to try requeueing earlier, but here’s to hoping that this did the trick, and I will have more accurate reporting and fewer “more time than is possible” log errors.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Reed</div><div class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jan 17, 2023, at 11:29 AM, Reed Dier <<a href="mailto:reed.dier@focusvq.com" class="">reed.dier@focusvq.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">So I was going to take a stab at trying to rectify this after taking care of post-holiday matters.<div class=""><br class=""></div><div class="">Paste of the $CLUSTER_job_table table where I think I see the issue, and now I just want to sanity check my steps to remediate.</div><div class=""><a href="https://rentry.co/qhw6mg" class="">https://rentry.co/qhw6mg</a> (pastebin alternative because markdown is paywalled for pastebin).</div><div class=""><br class=""><div class="">There are a number of job steps with a timelimit of 4294967295, where as the others of the same job array are 525600.</div><div class="">Obviously I want to edit those time limits to sane limits (match them to the others).</div><div class=""><div style="caret-color: rgb(0, 0, 0);" class="">I don’t see anything in the $CLUSTER_step_table that looks like it would need to be modified to match, though I could be wrong.</div></div><div class=""><br class=""></div><div class="">But then the part of getting slurm to pick it up is where I’m wanting to make sure I’m on the right page.</div><div class="">Should I manually update the mod_time timestamp and slurm will catch that at its next rollup?</div><div class="">Or will slurm catch the change in the time limit at update the mod_time when it sees it upon rollup?</div><div class=""><br class=""></div><div class="">I also don’t see any documentation stating how to manually trigger a rollup, either via slurmdbd.conf or command line flag.</div><div class="">Will it automagically perform a rollup at some predefined, non-configurable interval, or when restarting the daemon?</div><div class=""><br class=""></div><div class="">Apologies if this is all trivial information, just trying to measure twice and cut once.</div><div class=""><br class=""></div><div class="">Appreciate everyone’s help so far.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Reed</div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Dec 23, 2022, at 7:18 PM, Chris Samuel <<a href="mailto:chris@csamuel.org" class="">chris@csamuel.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">On 20/12/22 6:01 pm, Brian Andrus wrote:<br class=""><br class=""><blockquote type="cite" class="">You may want to dump the database, find what table/records need updated and try updating them. If anything went south, you could restore from the dump.<br class=""></blockquote><br class="">+lots to making sure you've got good backups first, and stop slurmdbd before you start on the backups and don't restart it until you've made the changes, including setting the rollup times to be before the jobs started to make sure that the rollups include these changes!<br class=""><br class="">When you start slurmdbd after making the changes it should see that it needs to do rollups and kick those off.<br class=""><br class="">All the best,<br class="">Chris<br class="">-- <br class="">Chris Samuel : <a href="http://www.csamuel.org/" class="">http://www.csamuel.org/</a> : Berkeley, CA, USA<br class=""></div></div></blockquote></div></div></div></div></blockquote></div><br class=""></div></body></html>