<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Try: <br>
</p>
<p>Â Â Â sacctmgr list runawayjobs</p>
<p>Brian Andrus<br>
</p>
<div class="moz-cite-prefix">On 12/20/2022 7:54 AM, Reed Dier wrote:<br>
</div>
<blockquote type="cite"
cite="mid:069A5B5A-CC57-46B8-9CDE-095CA83D7C83@focusvq.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Hoping this is a fairly simple one.
<div class=""><br class="">
</div>
<div class="">This is a small internal cluster that we’ve been
using for about 6 months now, and we’ve had some infrastructure
instability in that time, which I think may be the root culprit
behind this weirdness, but hopefully someone can point me in the
direction to solve the issue.</div>
<div class=""><br class="">
</div>
<div class="">I do a daily email of sreport to show how busy the
cluster was, and who were the top users.</div>
<div class="">Weirdly, I have a user that seems to be able to use
the same exact usage day after day after day, down to hundredth
of a percent, conspicuously even when they were on vacation and
claimed that they didn’t have job submissions in cron/etc.</div>
<div class=""><br class="">
</div>
<div class="">So then, taking a spin of the <a
href="https://lists.schedmd.com/pipermail/slurm-users/2022-December/009514.html"
class="" moz-do-not-send="true">scom tui </a>posted this
morning, I then filtered that user, and noticed that even though
I was only looking 2 days back at job history, I was seeing a
job from August.</div>
<div class=""><br class="">
</div>
<div class="">Conspicuously, the job state is cancelled, but the
job end time is 1y from the start time, meaning its job end time
is in 2023.</div>
<div class="">So something with the dbd is confused about
this/these jobs that are lingering and reporting cancelled but
still “on the books†somehow until next August.</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class=""><font class="" face="Menlo">â•──────────────────────────────────────────────────────────────────────────────────────────╮</font></div>
<div class=""><font class="" face="Menlo">│         Â
                            Â
       │</font></div>
<div class=""><font class="" face="Menlo">│  Job ID     Â
  : 290742                       Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Job Name    Â
  : $jobname                      Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  User      Â
  : $user                       Â
       │</font></div>
<div class=""><font class="" face="Menlo">│  Group      Â
  : $user                        Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Job Account   Â
  : $account                      Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Job Submission Â
  : 2022-08-08 08:44:52 -0400 EDT           Â
       │</font></div>
<div class=""><font class="" face="Menlo">│  Job Start    Â
  : 2022-08-08 08:46:53 -0400 EDT            Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Job End     Â
  : 2023-08-08 08:47:01 -0400 EDT            Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Job Wait time  Â
  : 2m1s                        Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Job Run time  Â
  : 8760h0m8s                     Â
       │</font></div>
<div class=""><font class="" face="Menlo">│  Partition    Â
  : $part                        Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  Priority    Â
  : 127282                       Â
      │</font></div>
<div class=""><font class="" face="Menlo">│  QoS       Â
  : $qos                        Â
      │</font></div>
<div class=""><font class="" face="Menlo">│         Â
                            Â
       │</font></div>
<div class=""><font class="" face="Menlo">│         Â
                            Â
       │</font></div>
<div class=""><font class="" face="Menlo">╰──────────────────────────────────────────────────────────────────────────────────────────╯</font></div>
<div class=""><font class="" face="Menlo">Steps count: 0</font></div>
</blockquote>
<br class="">
</div>
<div class="">
<blockquote type="cite" class=""><font class="" face="Menlo">Filter:
$user     Items: 13</font></blockquote>
<blockquote type="cite">
<div class=""><font class="" face="Menlo"><br class="">
</font></div>
<div class=""><font class="" face="Menlo">Â Job ID Â Â Â Job
Name               Part.  QoS    Â
Account   User       Nodes         State</font></div>
<div class=""><font class="" face="Menlo">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</font></div>
<div class=""><font class="" face="Menlo">Â 290714 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node32       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290716 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node24       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290736 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node00       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290742 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node01       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290770 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node02       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290777 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node03       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290793 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node04       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290797 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node05       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290799 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node06       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290801 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node07       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290814 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node08       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290817 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node09       Â
 CANCELLED</font></div>
<div class=""><font class="" face="Menlo">Â 290819 Â Â
 $jobname               $part  $qos   Â
 $acct    $user       node10       Â
 CANCELLED</font></div>
</blockquote>
</div>
<div class=""><br class="">
</div>
<div class="">I’d love to figure out the proper way to either
purge these jid’s from the accounting database cleanly, or
change the job end/run time to a sane/correct value.</div>
<div class="">Slurm is v21.08.8-2, and ntp is a stratum 1 server,
so time is in sync everywhere, not that multiple servers would
drift 1 year off like this.</div>
<div class=""><br class="">
</div>
<div class="">Thanks for any help,</div>
<div class="">Reed</div>
</blockquote>
</body>
</html>