Removing safely a node
Hi, What is the "official" process to remove nodes safely? I have drained the nodes so jobs are completed and put them in down state after they are completely drained. I edited the slurm.conf file to remove the nodes. After some time, I can see that the nodes were removed from the partition with the command sinfo However, I was told I might need to restart the service slurmctld, do you know if it is necessary? Should I also run scontrol reconfig? Best, *Fritz Ratnasamy* Data Scientist Information Technology
If I’m not mistaken, the manual for slurm.conf or one of the others lists either what action is needed to change every option, or has a combined list of what requires what (I can never remember and would have to look it up anyway). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On May 16, 2024, at 23:16, Ratnasamy, Fritz via slurm-users <slurm-users@lists.schedmd.com> wrote: Hi, What is the "official" process to remove nodes safely? I have drained the nodes so jobs are completed and put them in down state after they are completely drained. I edited the slurm.conf file to remove the nodes. After some time, I can see that the nodes were removed from the partition with the command sinfo However, I was told I might need to restart the service slurmctld, do you know if it is necessary? Should I also run scontrol reconfig? Best, Fritz Ratnasamy Data Scientist Information Technology -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On 5/17/24 05:16, Ratnasamy, Fritz via slurm-users wrote:
What is the "official" process to remove nodes safely? I have drained the nodes so jobs are completed and put them in down state after they are completely drained. I edited the slurm.conf file to remove the nodes. After some time, I can see that the nodes were removed from the partition with the command sinfo
However, I was told I might need to restart the service slurmctld, do you know if it is necessary? Should I also run scontrol reconfig?
The SchedMD presentations in https://slurm.schedmd.com/publications.html describe node add/remove. I've collected my notes on this in the Wiki page https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_operations/#add-and-remove-n... /Ole
participants (3)
-
Ole Holm Nielsen -
Ratnasamy, Fritz -
Ryan Novosielski