[slurm-users] How to debug a prolog script?
Brian Andrus
toomuchit at gmail.com
Thu Sep 15 23:54:36 UTC 2022
Davide,
Quick things to check:
* Permissions on the file itself (and the directories in the path to it)
* Existence of the script on the nodes (prologue is run on the nodes,
not the head)
Not sure your error is the prologue script itself. Does everything run
fine with no prologue configured?
Brian Andrus
On 9/15/2022 2:49 PM, Davide DelVento wrote:
> I have a super simple prolog script, as follows (very similar to the
> example one)
>
> #!/bin/bash
>
> if [[ $VAR == 1 ]]; then
> echo "True"
> fi
>
> exit 0
>
> This fails (and obviously causes great disruption to my production
> jobs). So I have two questions:
>
> 1. Why does it fail? It does so regardless of the value of the
> variable, so it must not be the echo not being in the PATH (note that
> [[ is a shell keyword). I understand that the echo command will go in
> a black hole and I should use "print ..." (not sure about its syntax,
> and the documentation is very cryptic, but I digress) or perhaps
> logger (as the example does), and I tried some of them with no luck.
>
> 2. How to debug the issue? Even increasing the debug level the
> slurmctld.log contains simply a "error: validate_node_specs: Prolog or
> job env setup failure on node xxx, draining the node" message, without
> even a line number or anything. Google does not return anything useful
> about this message
>
> 3. And more generally, how to debug a prolog (and epilog) script
> without disrupting all production jobs? Unfortunately we can't have
> another slurm install for testing, is there a sbatch option to force
> utilizing a prolog script which would not be executed for all the
> other jobs? Or perhaps making a dedicated queue?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220915/d5a3f665/attachment.htm>
More information about the slurm-users
mailing list