[slurm-users] How to debug a prolog script?

Davide DelVento davide.quantum at gmail.com
Thu Sep 15 21:49:03 UTC 2022


I have a super simple prolog script, as follows (very similar to the
example one)

#!/bin/bash

if [[ $VAR == 1 ]]; then
        echo "True"
fi

exit 0

This fails (and obviously causes great disruption to my production
jobs). So I have two questions:

1. Why does it fail? It does so regardless of the value of the
variable, so it must not be the echo not being in the PATH (note that
[[ is a shell keyword). I understand that the echo command will go in
a black hole and I should use "print ..." (not sure about its syntax,
and the documentation is very cryptic, but I digress) or perhaps
logger (as the example does), and I tried some of them with no luck.

2. How to debug the issue? Even increasing the debug level the
slurmctld.log contains simply a "error: validate_node_specs: Prolog or
job env setup failure on node xxx, draining the node" message, without
even a line number or anything. Google does not return anything useful
about this message

3. And more generally, how to debug a prolog (and epilog) script
without disrupting all production jobs? Unfortunately we can't have
another slurm install for testing, is there a sbatch option to force
utilizing a prolog script which would not be executed for all the
other jobs? Or perhaps making a dedicated queue?



More information about the slurm-users mailing list