[slurm-users] Rolling reboot with at most N machines down simultaneously?

David Simpson SimpsonD4 at cardiff.ac.uk
Thu Aug 4 16:03:28 UTC 2022


Another way might be to implement slurm power off/on (if not already) and induce it as required.

-------------
David Simpson - Senior Systems Engineer
ARCCA, Redwood Building,
King Edward VII Avenue,
Cardiff, CF10 3NB                                                                               

David Simpson - peiriannydd uwch systemau
ARCCA, Adeilad Redwood,
King Edward VII Avenue,
Caerdydd, CF10 3NB

-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Brian Andrus
Sent: 04 August 2022 14:47
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Rolling reboot with at most N machines down simultaneously?

External email to Cardiff University - Take care when replying/opening attachments or links.
Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.



This is actually brilliant!

Brian Andrus

On 8/3/2022 10:20 PM, Gerhard Strangar wrote:
> Phil Chiu wrote:
>
>>     - Individual slurm jobs which reboot nodes - With a for loop, I could
>>     submit a reboot job for each node. But I'm not sure how to limit this so at
>>     most N jobs are running simultaneously.
> With a fake license called reboot?
>



More information about the slurm-users mailing list