Just upgraded my grid to 25.11 a couple days ago. Got two nodes that were down for a bit and when I set them to resume and set the reason="" they are still showing up with the reason I had set while they were down. Nothing I do seems to unset that reason flag for these two nodes. I've set the nodes to down with a reason and then back to idle with the blank reason but it never goes back to "none" like the other nodes. Whatever the last reason set was persists. The behavior seems different since I updated to 25.11. Was running the 20.11 available from the epel-8 repos previously.
On 4/15/26 5:48 am, Berg, Stephen P CIV USN NRL DET SSC MS (USA) via slurm-users wrote:
Nothing I do seems to unset that reason flag for these two nodes.
What is the reason that is being set? There are some (for instance related to invalid registrations because of config issues or broken hardware) that cannot be cleared. -- Chris Samuel : http://www.csamuel.org/ : Philadelphia, PA, USA
While I've been fiddling with it I've used "test", "testing", "cause" and earlier this morning I set all the nodes to "-" just to see if it would take that. Just noticed that after a couple hours 91 of the 92 nodes still have "-" in the REASON column, but one of them now shows up as "none". If the flag is truly not set, or set to NULL does it show up as blank in the "sinfo -Nl" output or would it show as none like I'm used to seeing? ________________________________ From: Christopher Samuel via slurm-users <slurm-users@lists.schedmd.com> Sent: Wednesday, April 15, 2026 8:41 AM To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: [Non-DoD Source] [slurm-users] Re: Can't clear the REASON On 4/15/26 5:48 am, Berg, Stephen P CIV USN NRL DET SSC MS (USA) via slurm-users wrote:
Nothing I do seems to unset that reason flag for these two nodes.
What is the reason that is being set? There are some (for instance related to invalid registrations because of config issues or broken hardware) that cannot be cleared. -- Chris Samuel : https://usg01.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=05%7C02%7Cstephen.p.berg.civ%40us.navy.mil%7C16a6870bb2954278c6a308de9afa4f78%7Ce3333e00c8774b87b6ad45e942de1750%7C0%7C0%7C639118597042079774%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=HeTwRts8LWczKt0gnCaygo8LAoHhk3lXP8eERAxAnGo%3D&reserved=0<http://www.csamuel.org/> : Philadelphia, PA, USA -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On 4/15/26 8:05 am, Berg, Stephen P CIV USN NRL DET SSC MS (USA) via slurm-users wrote:
While I've been fiddling with it I've used "test", "testing", "cause" and earlier this morning I set all the nodes to "-" just to see if it would take that. Just noticed that after a couple hours 91 of the 92 nodes still have "-" in the REASON column, but one of them now shows up as "none".
If the flag is truly not set, or set to NULL does it show up as blank in the "sinfo -Nl" output or would it show as none like I'm used to seeing?
[oops - accidentally replied privately - this time to the list!] Ah - to resume a node you just do: scontrol update node=$NODE state=resume Don't try and set the reason field for it. -- Chris Samuel : http://www.csamuel.org/ : Philadelphia, PA, USA
I have tried that and it does work but the reason persists after the nodes get to an idle state. It's a bit confusing for the node to be idle after a reboot when the reason column still says "rebooting" or "down for maintenance" or whatever. ________________________________ From: Christopher Samuel via slurm-users <slurm-users@lists.schedmd.com> Sent: Wednesday, April 15, 2026 12:50 PM To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: [slurm-users] Re: [Non-DoD Source] Re: Can't clear the REASON On 4/15/26 8:05 am, Berg, Stephen P CIV USN NRL DET SSC MS (USA) via slurm-users wrote:
While I've been fiddling with it I've used "test", "testing", "cause" and earlier this morning I set all the nodes to "-" just to see if it would take that. Just noticed that after a couple hours 91 of the 92 nodes still have "-" in the REASON column, but one of them now shows up as "none".
If the flag is truly not set, or set to NULL does it show up as blank in the "sinfo -Nl" output or would it show as none like I'm used to seeing?
[oops - accidentally replied privately - this time to the list!] Ah - to resume a node you just do: scontrol update node=$NODE state=resume Don't try and set the reason field for it. -- Chris Samuel : https://usg01.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=05%7C02%7Cstephen.p.berg.civ%40us.navy.mil%7Cd10378b58e884c9117f808de9b199f1f%7Ce3333e00c8774b87b6ad45e942de1750%7C0%7C0%7C639118731517219544%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=z6JnVqMrwuXSRAGm9r7Ui30h%2FEluBF2yRxL%2BM0lnyMI%3D&reserved=0<http://www.csamuel.org/> : Philadelphia, PA, USA -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
participants (2)
-
Berg, Stephen P CIV USN NRL DET SSC MS (USA) -
Christopher Samuel