[slurm-users] error: power_save module disabled, NULL SuspendProgram
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Mon Mar 27 11:17:01 UTC 2023
Hi Thomas,
FYI: Slurm power_save works very well for us without the issues that you
describe below. We run Slurm 22.05.8, what's your version?
I've documented our setup in this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving
This page contains a link to power_save scripts on GitHub.
IHTH,
Ole
On 3/27/23 12:57, Dr. Thomas Orgis wrote:
> Am Mon, 06 Mar 2023 13:35:38 +0100
> schrieb Stefan Staeglich <staeglis at informatik.uni-freiburg.de>:
>
>> But this fixed not the main error but might have reduced the frequency of
>> occurring. Has someone observed similar issues? We will try a higher
>> SuspendTimeout.
>
> We had issues with power saving. We powered the idle nodes off, causing
> a full boot to resume. We observed repeatedly the strange behaviour
> that the node is present for a while, but only detected by slurmctld as
> being ready right when it is giving up with SuspendTimeout.
>
> But instead of fixing this possibly subtle logic error, we figured that
>
> a) The node suspend support in Slurm was not really designed for full
> power off/on, which can take minutes regularily.
>
> b) This functionality of taking nodes out of/into production is
> something the cluster admin does. This is not in the scope of the
> batch system.
>
> Hence I wrote a script that runs as a service on a superior admin node.
> It queries Slurm for idle nodes and pending jobs and then decides which
> nodes to drain and then power down or bring back online.
>
> This needs more knowledge on Slurm job and node states than I'd like,
> but it works. Ideally, I'd like the powersaving feature of slurm
> consisting of a simple interface that can communicate
>
> 1. which nodes are probably not needed in the coming x minutes/hours,
> depending on the job queue, with settings like keeping a minimum number
> of nodes idle, and
> 2. which nodes that are currently drained/offline it could use to satisfy
> user demand.
>
> I imagine that Slurm upstream is not very keen on hashing out a robust
> interface for that. I can see arguments for keeping this wholly
> internal to Slurm, but for me, taking nodes in/out of production is not
> directly a batch system's task. Obviously the integration of power
> saving that involves nodes really being powered down brings
> complications like the strange ResumeTimeout behaviour. Also, in the
> case of node that have trouble getting back online, the method inside
> Slurm provides for a bad user experience:
>
> The nodes are first allocated to the job, and _then_ they are powered
> up. In the worst case of a defective node, Slurm will wait for the
> whole SuspendTimeout just to realize that it doesn't really have the
> resources it just promised to the job, making the job run attempt fail
> needlessly.
>
> With my external approach, the handling of bringing a node back up is
> done outside slurmctld. Only after a node is back, it is undrained and
> jobs will be allocated on it. I use a draining with a specific reason
> to mark nodes that are offline due to power saving. What sucks is that
> I have to implement part of the scheduler in the sense that I need to
> match pending jobs' demands against properties of available nodes.
>
> Maybe the internal powersaving could be made more robust, but I would
> rather like to see more separation of concerns than putting everything
> into one box. Things are too intertangled, even with my simple concept
> of 'job' not beginning to describe what Slurm has in terms of various
> steps as scheduling entities that by default also use delayed
> allocation techniques (regarding prolog script behaviour, for example).
>
>
> Alrighty then,
>
> Thomas
>
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
More information about the slurm-users
mailing list