[slurm-users] How to debug a prolog script?
Bjørn-Helge Mevik
b.h.mevik at usit.uio.no
Fri Sep 16 07:30:19 UTC 2022
Davide DelVento <davide.quantum at gmail.com> writes:
> 2. How to debug the issue?
I'd try capturing all stdout and stderr from the script into a file on the compute
node, for instance like this:
exec &> /root/prolog_slurmd.$$
set -x # To print out all commands
before any other commands in the script. The "prolog_slurmd.<pid>" will
then contain a log of all commands executed in the script, along with
all output (stdout and stderr). If there is no "prolog_slurmd.<pid>"
file after the job has been scheduled, then as has been pointed out by
others, slurm wasn't able to exec the prolog at all.
> Even increasing the debug level the
> slurmctld.log contains simply a "error: validate_node_specs: Prolog or
> job env setup failure on node xxx, draining the node" message, without
> even a line number or anything.
Slurm only executes the prolog script. It doesn't parse it or evaluate
it itself, so it has no way of knowing what fails inside the script.
> 3. And more generally, how to debug a prolog (and epilog) script
> without disrupting all production jobs? Unfortunately we can't have
> another slurm install for testing, is there a sbatch option to force
> utilizing a prolog script which would not be executed for all the
> other jobs? Or perhaps making a dedicated queue?
I tend to reserve a node, install the updated prolog scripts there, and
run test jobs asking for that reservation. (Otherwise one could always
set up a small cluster of VMs and use that for simpler testing.)
--
B/H
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220916/108ad4c4/attachment.sig>
More information about the slurm-users
mailing list