<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Odds are the new version won't help for that. You will have to
do some mysql work to fix it then.</p>
<p>-Paul Edmon-<br>
</p>
<div class="moz-cite-prefix">On 3/6/2019 1:23 PM, Brian Andrus
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:6afb0445-4fc2-11ba-b21f-2371c0c2c713@gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>I am running the latest and did that, but it didn't change
anything. The jobs stay in the runaway state and no changes are
made to the database.</p>
<p>Using 18.08.2-1.</p>
<p>Maybe try updating to 19.05.0-0pre1?</p>
<p>Brian<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 3/6/2019 10:06 AM, Paul Edmon
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:cc203e08-612d-0fe9-b0cc-5f0c016d92b3@cfa.harvard.edu">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<p>A lot of this is automated in the new versions of slurm. You
should just need to run:</p>
<p>sacctmgr show runawayjobs</p>
<p>It will then give you an option to clean them and slurm will
handle the rest. If you add the -i option it will just clean
them automatically.</p>
<p>-Paul Edmon-<br>
</p>
<div class="moz-cite-prefix">On 3/6/2019 11:58 AM, Cyrus Proctor
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1d43bee5-8cfa-3dd2-933d-9b3d806005b5@tacc.utexas.edu">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<p>Hi Brian,</p>
<p>Others probably have better suggestions before going the
route I'm about to detail. If you do go this route, be
warned, you definitely have the ability to irrevocably lose
data or destroy your Slurm accounting database. Do so at
your own risk. I got here with Google-foo after being out of
other (known to me) options. Someone please save Brian
having to do what comes below ;-)<br>
</p>
<p>Last warning: I'd recommend turning off slurmdbd and
backing up the database (mysqldump) before going forward.<br>
</p>
<p>In my case, runaway jobs did not show up with `sacctmgr
list runawayjobs`. My problem was removing a user from the
Slurm database because it thought they still had active
jobs. The likely cause of this was the slurmdb daemon not
shutting down gracefully at some point. The job was long
gone but it was still in a pending state:<br>
</p>
<pre style="font-family: Consolas, Menlo, "Liberation Mono", Courier, monospace; margin: 1em 1em 1em 1.6em; padding: 8px; background-color: rgb(250, 250, 250); border: 1px solid rgb(226, 226, 226); border-radius: 3px; width: auto; overflow: auto hidden; color: rgb(51, 51, 51); font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># sacct -j 899139
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
899139 equil gpu-long p-1234 20 PENDING 0:0
</pre>
<div class="moz-cite-prefix">
<pre style="font-family: Consolas, Menlo, "Liberation Mono", Courier, monospace; margin: 1em 1em 1em 1.6em; padding: 8px; background-color: rgb(250, 250, 250); border: 1px solid rgb(226, 226, 226); border-radius: 3px; width: auto; overflow: auto hidden; color: rgb(51, 51, 51); font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># scontrol show job 899139
slurm_load_jobs error: Invalid job id specified
</pre>
</div>
<div class="moz-cite-prefix">
<pre style="font-family: Consolas, Menlo, "Liberation Mono", Courier, monospace; margin: 1em 1em 1em 1.6em; padding: 8px; background-color: rgb(250, 250, 250); border: 1px solid rgb(226, 226, 226); border-radius: 3px; width: auto; overflow: auto hidden; color: rgb(51, 51, 51); font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># mysql -u root -p
...
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7453
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use slurm_acct_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select state,time_end,time_start,time_submit,id_assoc,partition from banana_job_table where id_job=899139;
+-------+----------+------------+-------------+----------+-----------+
| state | time_end | time_start | time_submit | id_assoc | partition |
+-------+----------+------------+-------------+----------+-----------+
| 0 | 0 | 0 | 1546880711 | 2078 | gpu-long |
+-------+----------+------------+-------------+----------+-----------+
1 row in set (0.00 sec)
mysql> update banana_job_table set state=3 where id_job=899139;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select state,time_end,time_start,time_submit,id_assoc,partition from banana_job_table where id_job=899139;
+-------+----------+------------+-------------+----------+-----------+
| state | time_end | time_start | time_submit | id_assoc | partition |
+-------+----------+------------+-------------+----------+-----------+
| 3 | 0 | 0 | 1546880711 | 2078 | gpu-long |
+-------+----------+------------+-------------+----------+-----------+
1 row in set (0.00 sec)
mysql> update banana_job_table set time_start=1546880712 where id_job=899139;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select state,time_end,time_start,time_submit,id_assoc,partition from banana_job_table where id_job=899139;
+-------+----------+------------+-------------+----------+-----------+
| state | time_end | time_start | time_submit | id_assoc | partition |
+-------+----------+------------+-------------+----------+-----------+
| 3 | 0 | 1546880712 | 1546880711 | 2078 | gpu-long |
+-------+----------+------------+-------------+----------+-----------+
1 row in set (0.00 sec)
mysql> update banana_job_table set time_end=1546880713 where id_job=899139;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select state,time_end,time_start,time_submit,id_assoc,partition from banana_job_table where id_job=899139;
+-------+------------+------------+-------------+----------+-----------+
| state | time_end | time_start | time_submit | id_assoc | partition |
+-------+------------+------------+-------------+----------+-----------+
| 3 | 1546880713 | 1546880712 | 1546880711 | 2078 | gpu-long |
+-------+------------+------------+-------------+----------+-----------+
1 row in set (0.00 sec)</pre>
</div>
<div class="moz-cite-prefix">In this case for job ID 899139 on
the banana cluster, the state was not updated and neither
were start or end times. I went in and manually edited the
job entries such that Slurm thought they were complete with
feasible start and end times. Again, this worked for me. I
don't know if this is your problem or not. If you choose
this route, be careful and good luck!<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 3/6/19 10:15 AM, Brian Andrus
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:5d230f77-3831-e054-1da7-ea83a76b2e2c@gmail.com"> <br>
It shows several jobs that all have "Unknown" for end_time.
Some are PENDING and some are RUNNING (none are truly in
either state). <br>
<br>
It asked to fix them, which I did, but nothing seems to have
changed. They still show up with that command and in
reports. <br>
<br>
<br>
Brian <br>
<br>
On 3/5/2019 10:34 PM, Chris Samuel wrote: <br>
<blockquote type="cite">On Tuesday, 5 March 2019 10:07:30 AM
PST Brian Andrus wrote: <br>
<br>
<blockquote type="cite">Does anyone have a process they
use to handle empty (aka "Unknown") end <br>
times for jobs that are not running? <br>
</blockquote>
What does: <br>
<br>
sacctmgr list runawayjobs <br>
<br>
say? <br>
<br>
</blockquote>
<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>