<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
{font-family:Menlo;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Do they show up as run away jobs?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">sacctmgr show runawayjobs<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If they do, it should give you the option to fix them.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Jeff<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com>
<b>On Behalf Of </b>Reed Dier<br>
<b>Sent:</b> Tuesday, December 20, 2022 9:54 AM<br>
<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> [slurm-users] Job cancelled into the future<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hoping this is a fairly simple one.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">This is a small internal cluster that we’ve been using for about 6 months now, and we’ve had some infrastructure instability in that time, which I think may be the root culprit behind this weirdness, but hopefully someone can point me in
the direction to solve the issue.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I do a daily email of sreport to show how busy the cluster was, and who were the top users.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Weirdly, I have a user that seems to be able to use the same exact usage day after day after day, down to hundredth of a percent, conspicuously even when they were on vacation and claimed that they didn’t have job submissions in cron/etc.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">So then, taking a spin of the <a href="https://urldefense.com/v3/__https:/lists.schedmd.com/pipermail/slurm-users/2022-December/009514.html__;!!LkSTlj0I!HU994dhG-5tUYSafOMaTezByLfQVqq-CYpi_3KIx5Axh6N-cUxMFeVZDtk6fF54A-r_4HuXpc3RpvLtyeLibxTZO$">scom
tui </a>posted this morning, I then filtered that user, and noticed that even though I was only looking 2 days back at job history, I was seeing a job from August.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Conspicuously, the job state is cancelled, but the job end time is 1y from the start time, meaning its job end time is in 2023.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">So something with the dbd is confused about this/these jobs that are lingering and reporting cancelled but still “on the books” somehow until next August.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-family:"MS Gothic"">╭</span>──────────────────────────────────────────────────────────────────────────────────────────<span style="font-family:"MS Gothic"">╮</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job ID : 290742 │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job Name : $jobname │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ User : $user │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Group : $user │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job Account : $account │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job Submission : 2022-08-08 08:44:52 -0400 EDT │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job Start : 2022-08-08 08:46:53 -0400 EDT │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job End : 2023-08-08 08:47:01 -0400 EDT │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job Wait time : 2m1s │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Job Run time : 8760h0m8s │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Partition : $part │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ Priority : 127282 │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ QoS : $qos │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">│ │</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"MS Gothic"">╰</span>──────────────────────────────────────────────────────────────────────────────────────────<span style="font-family:"MS Gothic"">╯</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">Steps count: 0</span><o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-family:Menlo">Filter: $user Items: 13</span><o:p></o:p></p>
</blockquote>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> Job ID Job Name Part. QoS Account User Nodes State</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290714 $jobname $part $qos $acct $user node32 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290716 $jobname $part $qos $acct $user node24 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290736 $jobname $part $qos $acct $user node00 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290742 $jobname $part $qos $acct $user node01 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290770 $jobname $part $qos $acct $user node02 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290777 $jobname $part $qos $acct $user node03 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290793 $jobname $part $qos $acct $user node04 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290797 $jobname $part $qos $acct $user node05 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290799 $jobname $part $qos $acct $user node06 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290801 $jobname $part $qos $acct $user node07 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290814 $jobname $part $qos $acct $user node08 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290817 $jobname $part $qos $acct $user node09 CANCELLED</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:Menlo"> 290819 $jobname $part $qos $acct $user node10 CANCELLED</span><o:p></o:p></p>
</div>
</blockquote>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I’d love to figure out the proper way to either purge these jid’s from the accounting database cleanly, or change the job end/run time to a sane/correct value.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Slurm is v21.08.8-2, and ntp is a stratum 1 server, so time is in sync everywhere, not that multiple servers would drift 1 year off like this.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks for any help,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Reed<o:p></o:p></p>
</div>
</div>
</body>
</html>