<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Thomas, <br>
</p>
<p>The GUI app writes the script to the file slurm_script.sh in the
cwd. I did exactly what you suggested as my first step in
debugging check the Command= value from the output of 'scontrol
show job' to see what script was actually submitted, and it was
the slurm_script.sh in the cwd. <br>
</p>
<p>The user did provide me with some very useful information this
afternoon: The GUI app uses python to launch the job: Here's what
the user wrote to me. OMFIT is the name of the GUI application: <br>
</p>
<p><br>
</p>
<p>
<blockquote type="cite">New clue for the mpirun issue: The
following information might be helpful.
<ul style="box-sizing: border-box; margin-bottom: 16px;
margin-top: 0px; padding-left: 2em; color: rgb(36, 41, 46);
font-family: -apple-system, system-ui, 'Segoe UI', Helvetica,
Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
'Segoe UI Symbol'; font-size: 14px; font-variant-ligatures:
normal; orphans: 2; widows: 2; background-color: rgb(255, 255,
255);" class="">
<li style="box-sizing: border-box; margin-left: 0px;" class="">I
modified the script to use <code style="box-sizing:
border-box; font-family: SFMono-Regular, Consolas,
"Liberation Mono", Menlo, Courier, monospace;
font-size: 11.9px; background-color: rgba(27, 31, 35,
0.05); border-radius: 3px; margin: 0px; padding: 0.2em
0.4em;" class="">subprocess</code> submitting the job
directly. The job was submitted, but somehow it returned
NoneZeroError and the <code style="box-sizing: border-box;
font-family: SFMono-Regular, Consolas, "Liberation
Mono", Menlo, Courier, monospace; font-size: 11.9px;
background-color: rgba(27, 31, 35, 0.05); border-radius:
3px; margin: 0px; padding: 0.2em 0.4em;" class="">mpiexec</code> line
was skipped.</li>
</ul>
<div class="highlight highlight-source-python"
style="box-sizing: border-box; margin-bottom: 16px;
background-color: rgb(255, 255, 255); color: rgb(36, 41, 46);
font-family: -apple-system, system-ui, 'Segoe UI', Helvetica,
Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
'Segoe UI Symbol'; font-size: 14px; font-variant-ligatures:
normal; orphans: 2; widows: 2; overflow: visible !important;">
<pre style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 11.9px; margin-bottom: 0px; margin-top: 0px; overflow-wrap: normal; background-color: rgb(246, 248, 250); border-radius: 3px; line-height: 1.45; overflow: auto; padding: 16px; word-break: normal;" class="">OMFITx.executable(root,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">inputs</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>inputs,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">outputs</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>outputs,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">executable</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span><span class="pl-s" style="box-sizing: border-box; color: rgb(3, 47, 98);"><span class="pl-pds" style="box-sizing: border-box;">'</span>echo <span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">%s</span><span class="pl-pds" style="box-sizing: border-box;">'</span></span>,<span class="pl-c" style="box-sizing: border-box; color: rgb(106, 115, 125);"><span class="pl-c" style="box-sizing: border-box;">#</span>submit_command,</span>
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">script</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>(bashscript, <span class="pl-s" style="box-sizing: border-box; color: rgb(3, 47, 98);"><span class="pl-pds" style="box-sizing: border-box;">'</span>slurm.script<span class="pl-pds" style="box-sizing: border-box;">'</span></span>),
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">clean</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span><span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">True</span>,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">std_out</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>std_out,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">remotedir</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>unique_remotedir,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">ignoreReturnCode</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span><span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">True</span>)
p<span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>subprocess.Popen(<span class="pl-s" style="box-sizing: border-box; color: rgb(3, 47, 98);"><span class="pl-pds" style="box-sizing: border-box;">'</span>sbatch <span class="pl-pds" style="box-sizing: border-box;">'</span></span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">+</span>unique_remotedir<span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">+</span><span class="pl-s" style="box-sizing: border-box; color: rgb(3, 47, 98);"><span class="pl-pds" style="box-sizing: border-box;">'</span>slurm.script<span class="pl-pds" style="box-sizing: border-box;">'</span></span>,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">shell</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span><span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">True</span>,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">stdout</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>subprocess.<span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">PIPE</span>,
<span class="pl-v" style="box-sizing: border-box; color: rgb(227, 98, 9);">stderr</span><span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">=</span>subprocess.<span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">PIPE</span>)
std_out.append(p.stdout.read())
<span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">print</span>(std_out[<span class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73);">-</span><span class="pl-c1" style="box-sizing: border-box; color: rgb(0, 92, 197);">1</span>], p.stderr.read())</pre>
</div>
<ul style="box-sizing: border-box; margin-bottom: 16px;
margin-top: 0px; padding-left: 2em; color: rgb(36, 41, 46);
font-family: -apple-system, system-ui, 'Segoe UI', Helvetica,
Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
'Segoe UI Symbol'; font-size: 14px; font-variant-ligatures:
normal; orphans: 2; widows: 2; background-color: rgb(255, 255,
255);" class="">
<li style="box-sizing: border-box; margin-left: 0px;" class="">As
I mentioned above, my standalone python script can normally
submit jobs likewise using <code style="box-sizing:
border-box; font-family: SFMono-Regular, Consolas,
"Liberation Mono", Menlo, Courier, monospace;
font-size: 11.9px; background-color: rgba(27, 31, 35,
0.05); border-radius: 3px; margin: 0px; padding: 0.2em
0.4em;" class="">subprocess.Popen</code> or <code
style="box-sizing: border-box; font-family:
SFMono-Regular, Consolas, "Liberation Mono",
Menlo, Courier, monospace; font-size: 11.9px;
background-color: rgba(27, 31, 35, 0.05); border-radius:
3px; margin: 0px; padding: 0.2em 0.4em;" class="">subprocess.call</code>.
I created the following script at the working directory and
executed it with the same python version as OMFIT. It works
without skip.</li>
</ul>
<pre style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 11.9px; margin-bottom: 16px; margin-top: 0px; overflow-wrap: normal; background-color: rgb(246, 248, 250); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; line-height: 1.45; overflow: auto; padding: 16px; color: rgb(36, 41, 46); font-variant-ligatures: normal; orphans: 2; widows: 2;" class=""><code style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 11.9px; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; margin: 0px; padding: 0px; border: 0px; word-break: normal; display: inline; line-height: inherit; overflow: visible; overflow-wrap: normal; background-position: initial initial; background-repeat: initial initial;" class="">import sys
import os.path
import subprocess
print(sys.version, sys.path, subprocess.__file__)
p = subprocess.Popen('sbatch slurm.script', shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
print(p.stdout.read(), p.stderr.read())
</code></pre>
<div style="box-sizing: border-box; margin-top: 0px; color:
rgb(36, 41, 46); font-family: -apple-system, system-ui, 'Segoe
UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe
UI Emoji', 'Segoe UI Symbol'; font-size: 14px;
font-variant-ligatures: normal; orphans: 2; widows: 2;
background-color: rgb(255, 255, 255); margin-bottom: 0px
!important;" class="">The question is why the same <code
style="box-sizing: border-box; font-family: SFMono-Regular,
Consolas, "Liberation Mono", Menlo, Courier,
monospace; font-size: 11.9px; background-color: rgba(27, 31,
35, 0.05); border-radius: 3px; margin: 0px; padding: 0.2em
0.4em;" class="">subprocee.Popen</code> command works
differently in OMFIT and in the terminal, even if they are
called by the same version <code style="box-sizing:
border-box; font-family: SFMono-Regular, Consolas,
"Liberation Mono", Menlo, Courier, monospace;
font-size: 11.9px; background-color: rgba(27, 31, 35, 0.05);
border-radius: 3px; margin: 0px; padding: 0.2em 0.4em;"
class="">python2.7</code>.</div>
</blockquote>
</p>
<p>So now it's unclear whether this is a bug in Python, or Slurm
18.06.6-2. Since the user can write a python script that does
work, I think this is something specific to the application's
environment, rather than an issue with the Python-Slurm
interaction. The main piece of evidence that this might be a bug
in Slurm is that this issue started after the upgrade from
18.08.5-2 to 18.08.6-2, but correlation doesn't necessarily mean
causation. <br>
</p>
<p>Prentice<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 3/22/19 12:48 PM, Thomas M. Payerle
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAHJ2ZQ9ZFHWiCz2bYTNCAjBk4rmkJ9e3GTZ58Eo5KtZNC5H9Lg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Assuming the GUI produced script is as you
indicated (I am not sure where you got the script you showed,
but if it is not the actual script used by a job it might be
worthwhile to examine the Command= file from scontrol show job
to verify), then the only thing that should be different from a
GUI submission and a manual submission is the submission
environment. Does the manual submission work if you add
--export=NONE to the sbatch command to prevent the exporting of
environment variables? And maybe add a printenv to the script
to see what environment is in both cases. Though I confess I am
unable to think of any reasonable environmental setting that
might cause the observed symptoms.<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Mar 22, 2019 at 11:23
AM Prentice Bisbal <<a href="mailto:pbisbal@pppl.gov"
moz-do-not-send="true">pbisbal@pppl.gov</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On
3/21/19 6:56 PM, Reuti wrote:<br>
> Am 21.03.2019 um 23:43 schrieb Prentice Bisbal:<br>
><br>
>> Slurm-users,<br>
>><br>
>> My users here have developed a GUI application which
serves as a GUI interface to various physics codes they use.
From this GUI, they can submit jobs to Slurm. On Tuesday, we
upgraded Slurm from 18.08.5-2 to 18.08.6-2,and a user has
reported a problem when submitting Slurm jobs through this GUI
app that do not occur when the same sbatch script is submitted
from sbatch on the command-line.<br>
>><br>
>> […]<br>
>> When I replaced the mpirun command with an equivalent
srun command, everything works as desired, so the user can get
back to work and be productive.<br>
>><br>
>> While srun is a suitable workaround, and is arguably
the correct way to run an MPI job, I'd like to understand what
is going on here. Any idea what is going wrong, or additional
steps I can take to get more debug information?<br>
> Was an alias to `mpirun` introduced? It may cover the
real application and even the `which mpirun` will return the
correct value, but never be executed.<br>
><br>
> $ type mpirun<br>
> $ alias mpirun<br>
><br>
> may tell in the jobscript.<br>
><br>
Unfortunately, the script is in tcsh, so the 'type' command
doesn't work <br>
since, it's a bash built-in function. I did use the 'alias'
command to <br>
see all the defined aliases, and mpirun and mpiexec are not
aliased. Any <br>
other ideas?<br>
<br>
Prentice<br>
<br>
<br>
<br>
<br>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">Tom Payerle <br>
DIT-ACIGS/Mid-Atlantic Crossroads <a
href="mailto:payerle@umd.edu" target="_blank"
moz-do-not-send="true">payerle@umd.edu</a><br>
</div>
<div>5825 University Research Park (301)
405-6135<br>
</div>
<div dir="ltr">University of Maryland<br>
College Park, MD 20740-3831<br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>