[slurm-users] GPU-node not waking up after power-save

Loris Bennett loris.bennett at fu-berlin.de
Thu Oct 13 08:47:23 UTC 2022


Hi Ümit,

Ümit Seren <uemit.seren at gmail.com> writes:

> We use power saving with our GPU nodes and they power up fine. They take a bit longer to boot but that’s it. 
>
> What do you mean with not waking up ? 
>
> The power on script is not called ? 

The power-on script is called, but the boot process sometimes fails to
complete.  To be honest, I can't recall the exact details of why we gave
up on the power-saving, but I think it was some timing problem in the
way systemd was starting the services.  We probably just need to compare
the systemd configuration on the GPU nodes with that on the non-GPUs,
which do wake up properly.

Thanks for confirming that there is no fundamental issue.

Cheers,

Loris

> Best
>
> Ümit 
>
>  
>
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Loris Bennett <loris.bennett at fu-berlin.de>
> Date: Thursday, 13. October 2022 at 08:14
> To: Slurm Users Mailing List <slurm-users at lists.schedmd.com>
> Subject: [slurm-users] GPU-node not waking up after power-save
>
> Hi,
>
> We use Slurm's power saving mechanism to switch of idle nodes.  However,
> we don't currently use it for our GPU nodes.  This is because in the
> past these nodes failed to wake up again when jobs were submitted to the
> GPU partition.  Before we look at the issue due to the current energy
> situation, I was wondering whether this a problem others have (had).
>
> So does power-saving work in general for GPU nodes and, if so, are there
> any extra steps one needs to take in order to set things up properly?
>
> Cheers,
>
> Loris
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list