<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hoping this is a fairly simple one.<div class=""><br class=""></div><div class="">This is a small internal cluster that we’ve been using for about 6 months now, and we’ve had some infrastructure instability in that time, which I think may be the root culprit behind this weirdness, but hopefully someone can point me in the direction to solve the issue.</div><div class=""><br class=""></div><div class="">I do a daily email of sreport to show how busy the cluster was, and who were the top users.</div><div class="">Weirdly, I have a user that seems to be able to use the same exact usage day after day after day, down to hundredth of a percent, conspicuously even when they were on vacation and claimed that they didn’t have job submissions in cron/etc.</div><div class=""><br class=""></div><div class="">So then, taking a spin of the <a href="https://lists.schedmd.com/pipermail/slurm-users/2022-December/009514.html" class="">scom tui </a>posted this morning, I then filtered that user, and noticed that even though I was only looking 2 days back at job history, I was seeing a job from August.</div><div class=""><br class=""></div><div class="">Conspicuously, the job state is cancelled, but the job end time is 1y from the start time, meaning its job end time is in 2023.</div><div class="">So something with the dbd is confused about this/these jobs that are lingering and reporting cancelled but still “on the books” somehow until next August.</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""><font face="Menlo" class="">╭──────────────────────────────────────────────────────────────────────────────────────────╮</font></div><div class=""><font face="Menlo" class="">│                                                                                          │</font></div><div class=""><font face="Menlo" class="">│  Job ID               : 290742                                                           │</font></div><div class=""><font face="Menlo" class="">│  Job Name             : $jobname                                                         │</font></div><div class=""><font face="Menlo" class="">│  User                 : $user                                                            │</font></div><div class=""><font face="Menlo" class="">│  Group                : $user                                                            │</font></div><div class=""><font face="Menlo" class="">│  Job Account          : $account                                                         │</font></div><div class=""><font face="Menlo" class="">│  Job Submission       : 2022-08-08 08:44:52 -0400 EDT                                    │</font></div><div class=""><font face="Menlo" class="">│  Job Start            : 2022-08-08 08:46:53 -0400 EDT                                    │</font></div><div class=""><font face="Menlo" class="">│  Job End              : 2023-08-08 08:47:01 -0400 EDT                                    │</font></div><div class=""><font face="Menlo" class="">│  Job Wait time        : 2m1s                                                             │</font></div><div class=""><font face="Menlo" class="">│  Job Run time         : 8760h0m8s                                                        │</font></div><div class=""><font face="Menlo" class="">│  Partition            : $part                                                            │</font></div><div class=""><font face="Menlo" class="">│  Priority             : 127282                                                           │</font></div><div class=""><font face="Menlo" class="">│  QoS                  : $qos                                                             │</font></div><div class=""><font face="Menlo" class="">│                                                                                          │</font></div><div class=""><font face="Menlo" class="">│                                                                                          │</font></div><div class=""><font face="Menlo" class="">╰──────────────────────────────────────────────────────────────────────────────────────────╯</font></div><div class=""><font face="Menlo" class="">Steps count: 0</font></div></blockquote><br class=""></div><div class=""><blockquote type="cite" class=""><font face="Menlo" class="">Filter: $user         Items: 13</font></blockquote><blockquote type="cite"><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class=""> Job ID      Job Name                             Part.  QoS         Account     User             Nodes                 State</font></div><div class=""><font face="Menlo" class="">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</font></div><div class=""><font face="Menlo" class=""> 290714      $jobname                             $part  $qos        $acct       $user            node32                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290716      $jobname                             $part  $qos        $acct       $user            node24                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290736      $jobname                             $part  $qos        $acct       $user            node00                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290742      $jobname                             $part  $qos        $acct       $user            node01                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290770      $jobname                             $part  $qos        $acct       $user            node02                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290777      $jobname                             $part  $qos        $acct       $user            node03                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290793      $jobname                             $part  $qos        $acct       $user            node04                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290797      $jobname                             $part  $qos        $acct       $user            node05                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290799      $jobname                             $part  $qos        $acct       $user            node06                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290801      $jobname                             $part  $qos        $acct       $user            node07                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290814      $jobname                             $part  $qos        $acct       $user            node08                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290817      $jobname                             $part  $qos        $acct       $user            node09                CANCELLED</font></div><div class=""><font face="Menlo" class=""> 290819      $jobname                             $part  $qos        $acct       $user            node10                CANCELLED</font></div></blockquote></div><div class=""><br class=""></div><div class="">I’d love to figure out the proper way to either purge these jid’s from the accounting database cleanly, or change the job end/run time to a sane/correct value.</div><div class="">Slurm is v21.08.8-2, and ntp is a stratum 1 server, so time is in sync everywhere, not that multiple servers would drift 1 year off like this.</div><div class=""><br class=""></div><div class="">Thanks for any help,</div><div class="">Reed</div></body></html>