[slurm-users] How to debug a prolog script?
Davide DelVento
davide.quantum at gmail.com
Thu Sep 15 21:49:03 UTC 2022
I have a super simple prolog script, as follows (very similar to the
example one)
#!/bin/bash
if [[ $VAR == 1 ]]; then
echo "True"
fi
exit 0
This fails (and obviously causes great disruption to my production
jobs). So I have two questions:
1. Why does it fail? It does so regardless of the value of the
variable, so it must not be the echo not being in the PATH (note that
[[ is a shell keyword). I understand that the echo command will go in
a black hole and I should use "print ..." (not sure about its syntax,
and the documentation is very cryptic, but I digress) or perhaps
logger (as the example does), and I tried some of them with no luck.
2. How to debug the issue? Even increasing the debug level the
slurmctld.log contains simply a "error: validate_node_specs: Prolog or
job env setup failure on node xxx, draining the node" message, without
even a line number or anything. Google does not return anything useful
about this message
3. And more generally, how to debug a prolog (and epilog) script
without disrupting all production jobs? Unfortunately we can't have
another slurm install for testing, is there a sbatch option to force
utilizing a prolog script which would not be executed for all the
other jobs? Or perhaps making a dedicated queue?
More information about the slurm-users
mailing list