[slurm-users] error: power_save module disabled, NULL SuspendProgram
Stefan Staeglich
staeglis at informatik.uni-freiburg.de
Mon Mar 6 12:35:38 UTC 2023
Hi,
since a half year we using the suspend/resume support for Slurm. This works
quite well but sometimes it breaks and no nodes are suspended or resumed
anymore.
In this case we see the following message in the log:
error: power_save module disabled, NULL SuspendProgram
A restart of slurmctld fixes the issue for a few weeks.
In the beginning we had also messages like
error: power_save: program exit status of 1
So we started to implement error logging in the scripts and terminated them
always with exit code. The idea was avoiding that slurms sets the
SuspendProgram to NULL.
But this fixed not the main error but might have reduced the frequency of
occurring. Has someone observed similar issues? We will try a higher
SuspendTimeout.
Best,
Stefan
--
Stefan Stäglich, Universität Freiburg, Institut für Informatik
Georges-Köhler-Allee, Geb.52, 79110 Freiburg, Germany
E-Mail : staeglis at informatik.uni-freiburg.de
WWW : ml.informatik.uni-freiburg.de
Telefon: +49 761 203-8223
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230306/70765ce6/attachment.sig>
More information about the slurm-users
mailing list