[slurm-users] Cancel "reboot ASAP" for a node

Steven Dick kg4ydw at gmail.com
Mon Aug 10 14:28:16 UTC 2020


also state=resume  should work

On Fri, Aug 7, 2020 at 12:25 PM Hanby, Mike <mhanby at uab.edu> wrote:
>
> This is what's in /var/log/slurmctld
> Invalid node state transition requested for node c01 from=DRAINING to=CANCEL_REBOOT
>
>
>
> So it looks like, for version 18.08 at least, you have to first undrain, then cancel reboot:
>
>
>
> scontrol update NodeName="c01" State=undrain Reason="cancelling reboot"
>
> scontrol cancel_reboot c01
>
>
>
>
>
> From: "Hanby, Mike" <mhanby at uab.edu>
> Date: Friday, August 7, 2020 at 11:43 AM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: Cancel "reboot ASAP" for a node
>
>
>
> Howdy, (Slurm 18.08)
>
>
>
> We have a bunch of node that we've updated to "scontrol reboot ASAP".
>
>
>
> We'd like to cancel a few of those. From the man page, it's suggested that either of the following should work, however both report the same error " slurm_update error: Invalid node state specified":
>
>
> scontrol cancel_reboot c01
>
> or
>
> scontrol Update NodeName=c01 State=CANCEL_REBOOT
>
>
>
> Here's the 'scontrol show node c01' info for reference:
>
>
>
> NodeName=c01 Arch=x86_64 CoresPerSocket=12
>
>    CPUAlloc=7 CPUTot=24 CPULoad=7.04
>
>    AvailableFeatures=(null)
>
>    ActiveFeatures=(null)
>
>    Gres=(null)
>
>    NodeAddr=c0115 NodeHostName=c01 Version=18.08
>
>    OS=Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019
>
>    RealMemory=191877 AllocMem=6536 FreeMem=176717 Sockets=2 Boards=1
>
>    State=MIXED+DRAIN ThreadsPerCore=1 TmpDisk=887366 Weight=1 Owner=N/A MCS_label=N/A
>
>    Partitions=interactive,short,long,medium,express
>
>    BootTime=2020-07-08T23:16:27 SlurmdStartTime=2020-07-08T23:32:05
>
>    CfgTRES=cpu=24,mem=191877M,billing=24
>
>    AllocTRES=cpu=7,mem=6536M
>
>    CapWatts=n/a
>
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>    Reason=Reboot ASAP [root at 2020-08-06T10:29:22]
>
>
>
> Any thoughts as to how to cancel the reboot?
>
>
>
> ----------------
>
> Mike Hanby
>
> mhanby @ uab.edu
>
> Systems Analyst III - Enterprise
>
> IT Research Computing Services
>
> The University of Alabama at Birmingham



More information about the slurm-users mailing list