<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">2 votes for runawayjobs is a strong vote (and also something I’m glad to learn exists for the future), however, it does not appear to be the case.<div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""># sacctmgr show runawayjobs</div><div class="">Runaway Jobs: No runaway jobs found on cluster $cluster</div></blockquote><div class=""><br class=""></div>So unfortunately that doesn’t appear to be the culprit.</div><div class=""><br class=""></div><div class="">Appreciate the responses.</div><div class=""><br class=""></div><div class="">Reed<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Dec 20, 2022, at 10:03 AM, Brian Andrus <<a href="mailto:toomuchit@gmail.com" class="">toomuchit@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
  
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
  
  <div class=""><p class="">Try: <br class="">
    </p><p class="">    sacctmgr list runawayjobs</p><p class="">Brian Andrus<br class="">
    </p>
    <div class="moz-cite-prefix">On 12/20/2022 7:54 AM, Reed Dier wrote:<br class="">
    </div>
    <blockquote type="cite" cite="mid:069A5B5A-CC57-46B8-9CDE-095CA83D7C83@focusvq.com" class="">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
      Hoping this is a fairly simple one.
      <div class=""><br class="">
      </div>
      <div class="">This is a small internal cluster that we’ve been
        using for about 6 months now, and we’ve had some infrastructure
        instability in that time, which I think may be the root culprit
        behind this weirdness, but hopefully someone can point me in the
        direction to solve the issue.</div>
      <div class=""><br class="">
      </div>
      <div class="">I do a daily email of sreport to show how busy the
        cluster was, and who were the top users.</div>
      <div class="">Weirdly, I have a user that seems to be able to use
        the same exact usage day after day after day, down to hundredth
        of a percent, conspicuously even when they were on vacation and
        claimed that they didn’t have job submissions in cron/etc.</div>
      <div class=""><br class="">
      </div>
      <div class="">So then, taking a spin of the <a href="https://lists.schedmd.com/pipermail/slurm-users/2022-December/009514.html" class="" moz-do-not-send="true">scom tui </a>posted this
        morning, I then filtered that user, and noticed that even though
        I was only looking 2 days back at job history, I was seeing a
        job from August.</div>
      <div class=""><br class="">
      </div>
      <div class="">Conspicuously, the job state is cancelled, but the
        job end time is 1y from the start time, meaning its job end time
        is in 2023.</div>
      <div class="">So something with the dbd is confused about
        this/these jobs that are lingering and reporting cancelled but
        still “on the books” somehow until next August.</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <blockquote type="cite" class="">
          <div class=""><font class="" face="Menlo">╭──────────────────────────────────────────────────────────────────────────────────────────╮</font></div>
          <div class=""><font class="" face="Menlo">│                  
                                                                       
                           │</font></div>
          <div class=""><font class="" face="Menlo">│  Job ID          
                  : 290742                                              
                          │</font></div>
          <div class=""><font class="" face="Menlo">│  Job Name        
                  : $jobname                                            
                          │</font></div>
          <div class=""><font class="" face="Menlo">│  User            
                  : $user                                              
                           │</font></div>
          <div class=""><font class="" face="Menlo">│  Group            
                 : $user                                                
                         │</font></div>
          <div class=""><font class="" face="Menlo">│  Job Account      
                 : $account                                            
                          │</font></div>
          <div class=""><font class="" face="Menlo">│  Job Submission  
                  : 2022-08-08 08:44:52 -0400 EDT                      
                           │</font></div>
          <div class=""><font class="" face="Menlo">│  Job Start        
                 : 2022-08-08 08:46:53 -0400 EDT                        
                         │</font></div>
          <div class=""><font class="" face="Menlo">│  Job End          
                 : 2023-08-08 08:47:01 -0400 EDT                        
                         │</font></div>
          <div class=""><font class="" face="Menlo">│  Job Wait time    
                 : 2m1s                                                
                          │</font></div>
          <div class=""><font class="" face="Menlo">│  Job Run time    
                  : 8760h0m8s                                          
                           │</font></div>
          <div class=""><font class="" face="Menlo">│  Partition        
                 : $part                                                
                         │</font></div>
          <div class=""><font class="" face="Menlo">│  Priority        
                  : 127282                                              
                          │</font></div>
          <div class=""><font class="" face="Menlo">│  QoS              
                 : $qos                                                
                          │</font></div>
          <div class=""><font class="" face="Menlo">│                  
                                                                       
                           │</font></div>
          <div class=""><font class="" face="Menlo">│                  
                                                                       
                           │</font></div>
          <div class=""><font class="" face="Menlo">╰──────────────────────────────────────────────────────────────────────────────────────────╯</font></div>
          <div class=""><font class="" face="Menlo">Steps count: 0</font></div>
        </blockquote>
        <br class="">
      </div>
      <div class="">
        <blockquote type="cite" class=""><font class="" face="Menlo">Filter:
            $user         Items: 13</font></blockquote>
        <blockquote type="cite" class="">
          <div class=""><font class="" face="Menlo"><br class="">
            </font></div>
          <div class=""><font class="" face="Menlo"> Job ID      Job
              Name                             Part.  QoS        
              Account     User             Nodes                 State</font></div>
          <div class=""><font class="" face="Menlo">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</font></div>
          <div class=""><font class="" face="Menlo"> 290714    
               $jobname                             $part  $qos      
               $acct       $user            node32              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290716    
               $jobname                             $part  $qos      
               $acct       $user            node24              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290736    
               $jobname                             $part  $qos      
               $acct       $user            node00              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290742    
               $jobname                             $part  $qos      
               $acct       $user            node01              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290770    
               $jobname                             $part  $qos      
               $acct       $user            node02              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290777    
               $jobname                             $part  $qos      
               $acct       $user            node03              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290793    
               $jobname                             $part  $qos      
               $acct       $user            node04              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290797    
               $jobname                             $part  $qos      
               $acct       $user            node05              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290799    
               $jobname                             $part  $qos      
               $acct       $user            node06              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290801    
               $jobname                             $part  $qos      
               $acct       $user            node07              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290814    
               $jobname                             $part  $qos      
               $acct       $user            node08              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290817    
               $jobname                             $part  $qos      
               $acct       $user            node09              
               CANCELLED</font></div>
          <div class=""><font class="" face="Menlo"> 290819    
               $jobname                             $part  $qos      
               $acct       $user            node10              
               CANCELLED</font></div>
        </blockquote>
      </div>
      <div class=""><br class="">
      </div>
      <div class="">I’d love to figure out the proper way to either
        purge these jid’s from the accounting database cleanly, or
        change the job end/run time to a sane/correct value.</div>
      <div class="">Slurm is v21.08.8-2, and ntp is a stratum 1 server,
        so time is in sync everywhere, not that multiple servers would
        drift 1 year off like this.</div>
      <div class=""><br class="">
      </div>
      <div class="">Thanks for any help,</div>
      <div class="">Reed</div>
    </blockquote>
  </div>

</div></blockquote></div><br class=""></div></body></html>