On 7/10/25 10:53 pm, Ratnasamy, Fritz via slurm-users wrote:
Inside the prolog.d folder, there are 2 scripts which run with no errors as far as I can see but is there a way to debug why the nodes are going in draining mode once in a while because of "prolog error"? That seems to happen at random times and on random nodes.
You could try and add some logging to the start of your prolog to capture execution and errors. Something like this:
~/tmp/test$ cat prolog.sh #!/bin/bash
exec 1>>"/tmp/prolog.log.${SLURM_JOB_ID}.${$}" exec 2>&1
set -x
echo hello fooo ~/tmp/test$ SLURM_JOB_ID=1234 ./prolog.sh ~/tmp/test$ echo $? 127 ~/tmp/test$ cat /tmp/prolog.log.1234.10512 + echo hello hello + fooo ./prolog.sh: line 9: fooo: command not found ~/tmp/test$
Best of luck! Chris