<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Davide,<br>
</p>
<p>Quick things to check:</p>
<ul>
<li>Permissions on the file itself (and the directories in the
path to it)</li>
<li>Existence of the script on the nodes (prologue is run on the
nodes, not the head)<br>
</li>
</ul>
<p>Not sure your error is the prologue script itself. Does
everything run fine with no prologue configured?</p>
<p>Brian Andrus<br>
</p>
<div class="moz-cite-prefix">On 9/15/2022 2:49 PM, Davide DelVento
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAAX1q8bDu20BnpEk77-Ovq1NTM4q8GEzRbexZzjwXzSFoMd35w@mail.gmail.com">
<pre class="moz-quote-pre" wrap="">I have a super simple prolog script, as follows (very similar to the
example one)
#!/bin/bash
if [[ $VAR == 1 ]]; then
echo "True"
fi
exit 0
This fails (and obviously causes great disruption to my production
jobs). So I have two questions:
1. Why does it fail? It does so regardless of the value of the
variable, so it must not be the echo not being in the PATH (note that
[[ is a shell keyword). I understand that the echo command will go in
a black hole and I should use "print ..." (not sure about its syntax,
and the documentation is very cryptic, but I digress) or perhaps
logger (as the example does), and I tried some of them with no luck.
2. How to debug the issue? Even increasing the debug level the
slurmctld.log contains simply a "error: validate_node_specs: Prolog or
job env setup failure on node xxx, draining the node" message, without
even a line number or anything. Google does not return anything useful
about this message
3. And more generally, how to debug a prolog (and epilog) script
without disrupting all production jobs? Unfortunately we can't have
another slurm install for testing, is there a sbatch option to force
utilizing a prolog script which would not be executed for all the
other jobs? Or perhaps making a dedicated queue?
</pre>
</blockquote>
</body>
</html>