[slurm-users] error: power_save module disabled, NULL SuspendProgram
Dr. Thomas Orgis
thomas.orgis at uni-hamburg.de
Wed Mar 29 13:43:22 UTC 2023
Am Wed, 29 Mar 2023 14:42:33 +0200
schrieb Ben Polman <Ben.Polman at science.ru.nl>:
> I'd be interested in your kludge, we face a similar situation where the
> slurmctld node
> does not have access to the ipmi network and can not ssh to machines
> that have access.
> We are thinking on creating a rest interface to a control server which
> would be running the ipmi commands
We settled on transient files in /dev/shm on the slurmctld side as
"API". You could call it in-memory transactional database;-)
#!/bin/sh
# node-suspend and node-resume (symlinked) script
powerdir=/dev/shm/powersave
scontrol=$(cd "$(dirname "$0")" && pwd)/scontrol
hostlist=$1
case $0 in
*-suspend)
subdir=suspend
;;
*-resume)
subdir=resume
;;
esac
mkdir -p "$powerdir/$subdir" &&
cd "$powerdir/$subdir" &&
tmp=$(mktemp XXXXXXX.tmp) &&
$scontrol show hostnames "$hostlist" > "$tmp" &&
echo "$(date +%Y%m%d-%H%M%S) $(basename $0) $(cat "$tmp"|tr '\n' ' ')" >> $powerdir/log
mv "$tmp" "${tmp%.tmp}.list"
# end
This atomically creates powersave/suspend/*.list and
powersave/resume/*.list files with node names in them.
On the priviledged server, a script periodically looked at the directories
(via ssh) and triggered the appropriate actions, including some
heuristics about unlcean shutdowns or spontaneous re-availability (with
a thousand runs, there's a good chance for something getting stuck, in
some driver code, even).
#!/bin/sh
powerdir=/dev/shm/powersave
batch()
{
ssh-wrapper-that-correctly-quotes-argument-list --host=batchhost "$@"
}
while sleep 5
do
suspendlists=$(batch ls "$powerdir/suspend/" 2>/dev/null | grep '.list$')
for f in $suspendlists
do
hosts=$(batch cat "$powerdir/suspend/$f" 2>/dev/null)
for h in $hosts
do
case "$h" in
node*|data*)
echo "suspending $h"
node-shutdown-wrapper "$h"
;;
*)
echo "malformed node name"
;;
esac
done
batch rm -f "$powerdir/suspend/$f"
done
resumelists=$(batch ls $powerdir/resume/ 2>/dev/null | grep '.list$')
for f in $resumelists
do
hosts=$(batch cat "$powerdir/resume/$f" 2>/dev/null)
for h in $hosts
do
- case "$h" in
node*)
echo "resuming $h"
# Assume the node _should_ be switched off. Ensure that now (in
# case it hung during shutdown).
if ipmi-wrapper "$h" chassis power status|grep -q on$; then
if ssh -o ConnectTimeout=2 "$h" pgrep slurmd >/dev/null 2>&1 </dev/null; then
echo "skipping apparently active node $h"
else
echo "forcing power reset on $h"
ipmi-wrapper "$h" chassis power reset
fi
else
ipmi-wrapper "$h" chassis power on
fi
# Wait to make sure?
;;
*)
echo "malformed node name"
;;
esac
done
batch rm -f "$powerdir/resume/$f"
done
done
# end
The current approach handles resume better, waiting for a number of
hosts at he same time and only un-draining those that reappeared.
Back then, we relied on the nodes being automatically incorporated by
slurmctld. This worked mostly, but not always, resulting in spurious
NODE_FAILs which started to annoy users.
Alrighty then,
Thomas
--
Dr. Thomas Orgis
HPC @ Universität Hamburg
More information about the slurm-users
mailing list