[slurm-users] error: power_save module disabled, NULL SuspendProgram

Stefan Staeglich staeglis at informatik.uni-freiburg.de
Mon Mar 6 12:35:38 UTC 2023


Hi,

since a half year we using the suspend/resume support for Slurm. This works 
quite well but sometimes it breaks and no nodes are suspended or resumed 
anymore.

In this case we see the following message in the log:
error: power_save module disabled, NULL SuspendProgram

A restart of slurmctld fixes the issue for a few weeks.

In the beginning we had also messages like
error: power_save: program exit status of 1

So we started to implement error logging in the scripts and terminated  them  
always with exit code. The idea was avoiding that slurms sets the 
SuspendProgram to NULL.

But this fixed not the main error but might have reduced the frequency of 
occurring. Has someone observed similar issues? We will try a higher 
SuspendTimeout.

Best,
Stefan
-- 
Stefan Stäglich,  Universität Freiburg,  Institut für Informatik
Georges-Köhler-Allee,  Geb.52,   79110 Freiburg,    Germany

E-Mail : staeglis at informatik.uni-freiburg.de
WWW    : ml.informatik.uni-freiburg.de
Telefon: +49 761 203-8223
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230306/70765ce6/attachment.sig>


More information about the slurm-users mailing list