<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Seems like the time may have been off on the db server at the
      insert/update.</p>
    <p>You may want to dump the database, find what table/records need
      updated and try updating them. If anything went south, you could
      restore from the dump.</p>
    <p>Brian Andrus<br>
    </p>
    <div class="moz-cite-prefix">On 12/20/2022 11:51 AM, Reed Dier
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:A889B640-1F36-4DDE-9603-B366ACCCD1E6@focusvq.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      Just to followup with some things I’ve tried:
      <div class=""><br class="">
      </div>
      <div class="">scancel doesn’t want to touch it:</div>
      <div class="">
        <blockquote type="cite" class="">
          <div class=""><font class="" face="Menlo"># scancel -v 290710</font></div>
          <div class=""><font class="" face="Menlo">scancel: Terminating
              job 290710</font></div>
          <div class=""><font class="" face="Menlo">scancel: error: Kill
              job error on job id 290710: Job/step already completing or
              completed</font></div>
        </blockquote>
        <div><br class="">
        </div>
        <div>pscontrol does see that these are all members of the same
          array, but doesn’t want to touch it:</div>
        <div>
          <blockquote type="cite" class="">
            <div><font class="" face="Menlo"># scontrol update
                JobID=290710 EndTime=2022-08-09T08:47:01</font></div>
            <div><font class="" face="Menlo">290710_4,6,26,32,60,67,83,87,89,91,...:
                Job has already finished</font></div>
          </blockquote>
          <br class="">
        </div>
        <div>And trying to modify the job’s end time with sacctmgr
          fails, as expected, to modify the EndTime because EndTime is
          only a where spec, not a set spec, also tried EndTime=now with
          same results:</div>
        <div>
          <blockquote type="cite" class="">
            <div><font class="" face="Menlo"># sacctmgr modify job where
                JobID=290710 set EndTime=2022-08-09T08:47:01</font></div>
            <div><font class="" face="Menlo"> Unknown option:
                EndTime=2022-08-09T08:47:01</font></div>
            <div><font class="" face="Menlo"> Use keyword 'where' to
                modify condition</font></div>
            <div><font class="" face="Menlo"> You didn't give me
                anything to set</font></div>
          </blockquote>
          <br class="">
        </div>
        <div>I was able to set a comment for the jobs/array, so the DBD
          can see/talk to them.</div>
        <div>One additional thing to mention is that there are 14 JIDs
          that are stuck like this, 1 is an Array JID, and 13 of them
          are array tasks on the original Array ID.</div>
        <div><br class="">
        </div>
        <div>But figured I would provide some of the other steps I’ve
          tried to flush those ideas.</div>
        <div><br class="">
        </div>
        <div>Thanks,</div>
        <div>Reed</div>
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On Dec 20, 2022, at 10:08 AM, Reed Dier <<a
                href="mailto:reed.dier@focusvq.com"
                class="moz-txt-link-freetext" moz-do-not-send="true">reed.dier@focusvq.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <meta http-equiv="Content-Type" content="text/html;
                charset=UTF-8" class="">
              <div style="word-wrap: break-word; -webkit-nbsp-mode:
                space; line-break: after-white-space;" class="">2 votes
                for runawayjobs is a strong vote (and also something I’m
                glad to learn exists for the future), however, it does
                not appear to be the case.
                <div class=""><br class="">
                </div>
                <div class="">
                  <blockquote type="cite" class="">
                    <div class=""># sacctmgr show runawayjobs</div>
                    <div class="">Runaway Jobs: No runaway jobs found on
                      cluster $cluster</div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  So unfortunately that doesn’t appear to be the
                  culprit.</div>
                <div class=""><br class="">
                </div>
                <div class="">Appreciate the responses.</div>
                <div class=""><br class="">
                </div>
                <div class="">Reed<br class="">
                  <div class=""><br class="">
                    <blockquote type="cite" class="">
                      <div class="">On Dec 20, 2022, at 10:03 AM, Brian
                        Andrus <<a href="mailto:toomuchit@gmail.com"
                          class="moz-txt-link-freetext"
                          moz-do-not-send="true">toomuchit@gmail.com</a>>
                        wrote:</div>
                      <br class="Apple-interchange-newline">
                      <div class="">
                        <meta http-equiv="Content-Type"
                          content="text/html; charset=UTF-8" class="">
                        <div class="">
                          <p class="">Try: <br class="">
                          </p>
                          <p class="">    sacctmgr list runawayjobs</p>
                          <p class="">Brian Andrus<br class="">
                          </p>
                          <div class="moz-cite-prefix">On 12/20/2022
                            7:54 AM, Reed Dier wrote:<br class="">
                          </div>
                          <blockquote type="cite"
                            cite="mid:069A5B5A-CC57-46B8-9CDE-095CA83D7C83@focusvq.com"
                            class="">
                            <meta http-equiv="Content-Type"
                              content="text/html; charset=UTF-8"
                              class="">
                            Hoping this is a fairly simple one.
                            <div class=""><br class="">
                            </div>
                            <div class="">This is a small internal
                              cluster that we’ve been using for about 6
                              months now, and we’ve had some
                              infrastructure instability in that time,
                              which I think may be the root culprit
                              behind this weirdness, but hopefully
                              someone can point me in the direction to
                              solve the issue.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">I do a daily email of sreport
                              to show how busy the cluster was, and who
                              were the top users.</div>
                            <div class="">Weirdly, I have a user that
                              seems to be able to use the same exact
                              usage day after day after day, down to
                              hundredth of a percent, conspicuously even
                              when they were on vacation and claimed
                              that they didn’t have job submissions in
                              cron/etc.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">So then, taking a spin of the <a
href="https://lists.schedmd.com/pipermail/slurm-users/2022-December/009514.html"
                                class="" moz-do-not-send="true">scom
                                tui </a>posted this morning, I then
                              filtered that user, and noticed that even
                              though I was only looking 2 days back at
                              job history, I was seeing a job from
                              August.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">Conspicuously, the job state
                              is cancelled, but the job end time is 1y
                              from the start time, meaning its job end
                              time is in 2023.</div>
                            <div class="">So something with the dbd is
                              confused about this/these jobs that are
                              lingering and reporting cancelled but
                              still “on the books” somehow until next
                              August.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">
                              <blockquote type="cite" class="">
                                <div class=""><font class=""
                                    face="Menlo">╭──────────────────────────────────────────────────────────────────────────────────────────╮</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│                      
                                                                       
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job ID              
                                    : 290742                            
                                                                  │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job Name            
                                    : $jobname                          
                                                                  │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  User                
                                    : $user                            
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Group              
                                     : $user                            
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job Account        
                                     : $account                        
                                                                    │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job Submission      
                                    : 2022-08-08 08:44:52 -0400 EDT    
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job Start          
                                     : 2022-08-08 08:46:53 -0400 EDT    
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job End            
                                     : 2023-08-08 08:47:01 -0400 EDT    
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job Wait time      
                                     : 2m1s                            
                                                                    │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Job Run time        
                                    : 8760h0m8s                        
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Partition          
                                     : $part                            
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  Priority            
                                    : 127282                            
                                                                  │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│  QoS                
                                     : $qos                            
                                                                    │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│                      
                                                                       
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">│                      
                                                                       
                                                                   │</font></div>
                                <div class=""><font class=""
                                    face="Menlo">╰──────────────────────────────────────────────────────────────────────────────────────────╯</font></div>
                                <div class=""><font class=""
                                    face="Menlo">Steps count: 0</font></div>
                              </blockquote>
                              <br class="">
                            </div>
                            <div class="">
                              <blockquote type="cite" class=""><font
                                  class="" face="Menlo">Filter: $user  
                                        Items: 13</font></blockquote>
                              <blockquote type="cite" class="">
                                <div class=""><font class=""
                                    face="Menlo"><br class="">
                                  </font></div>
                                <div class=""><font class=""
                                    face="Menlo"> Job ID      Job Name  
                                                              Part.  QoS
                                            Account     User            
                                    Nodes                 State</font></div>
                                <div class=""><font class=""
                                    face="Menlo">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290714      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node32                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290716      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node24                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290736      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node00                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290742      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node01                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290770      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node02                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290777      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node03                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290793      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node04                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290797      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node05                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290799      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node06                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290801      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node07                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290814      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node08                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290817      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node09                CANCELLED</font></div>
                                <div class=""><font class=""
                                    face="Menlo"> 290819      $jobname  
                                                              $part
                                     $qos        $acct       $user      
                                         node10                CANCELLED</font></div>
                              </blockquote>
                            </div>
                            <div class=""><br class="">
                            </div>
                            <div class="">I’d love to figure out the
                              proper way to either purge these jid’s
                              from the accounting database cleanly, or
                              change the job end/run time to a
                              sane/correct value.</div>
                            <div class="">Slurm is v21.08.8-2, and ntp
                              is a stratum 1 server, so time is in sync
                              everywhere, not that multiple servers
                              would drift 1 year off like this.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">Thanks for any help,</div>
                            <div class="">Reed</div>
                          </blockquote>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br class="">
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
    </blockquote>
  </body>
</html>