[slurm-users] slurm_update error: Invalid node state specified
Sushil Mishra
sushilbioinfo at gmail.com
Tue Oct 11 18:26:04 UTC 2022
Thanks so much! Indeed it was a mismatch between the actual and slurmd.conf
SocketsPerBoard value.
Sushil
On Tue, Oct 11, 2022 at 11:25 AM Paul H. Hargrove <phhargrove at lbl.gov>
wrote:
> I think Rob is "on the right track" here. Specifically, I don't think the
> error message means that "RESUME" is unrecognized as the name of a state.
> Rather the message means that a state transition from "INVAL" to "RESUME"
> is invalid. I can reproduce that message by trying to "RESUME" an "IDLE"
> node, but "RESUME" works fine for node which has been revently rebooted.
>
> -Paul
>
>
> On Tue, Oct 11, 2022 at 8:14 AM Groner, Rob <rug262 at psu.edu> wrote:
>
>> Have you checked the logs for slurmd and slurmctld? I seem to recall
>> that the "invalid" state for a node meant that there was some discrepancy
>> between what the node says or thinks it has (slurmd -C) and what the
>> slurm.conf says it has. While there is that discrepancy and the node is
>> invalid, you can't just tell it to resume.
>>
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Sushil Mishra <sushilbioinfo at gmail.com>
>> *Sent:* Tuesday, October 11, 2022 10:08 AM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* [slurm-users] slurm_update error: Invalid node state specified
>>
>> You don't often get email from sushilbioinfo at gmail.com. Learn why this
>> is important <https://aka.ms/LearnAboutSenderIdentification>
>> Dear all,
>>
>> I am stuck with scontrol not recognizing the state keywords. I wonder if
>> someone can point me to the possible cause of the error. I
>> restarted slurmd a few times, and it didn't help.
>>
>> [sushil at fucose ~]$ sinfo
>> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
>> LocalQ* up infinite 1 inval fucose
>>
>> [sushil at fucose ~]$ sinfo -R
>> REASON USER TIMESTAMP NODELIST
>> cg sushil 2022-10-10T18:11:27 fucose
>>
>> [sushil at fucose ~]$ sudo scontrol update NodeName=fucose state=RESUME
>> [sudo] password for sushil:
>> slurm_update error: Invalid node state specified
>>
>> [sushil at fucose ~]$ squeue
>> JOBID PARTITION NAME USER ST TIME NODES
>> NODELIST(REASON)
>>
>> Best,
>> Sushil
>>
>>
>
>
> --
> Paul H. Hargrove <PHHargrove at lbl.gov>
> Pronouns: he, him, his
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department
> Lawrence Berkeley National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221011/324535bd/attachment.htm>
More information about the slurm-users
mailing list