[slurm-users] SLURM: reconfig

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu May 5 14:36:54 UTC 2022


On 5/5/22 15:53, Ward Poelmans wrote:
> Hi Steven,
> 
> I think truly dynamic adding and removing of nodes is something that's on 
> the roadmap for slurm 23.02?

Yes, see slide 37 in https://slurm.schedmd.com/SLUG21/Roadmap.pdf from the 
Slurm publications site https://slurm.schedmd.com/publications.html

/Ole


> On 5/05/2022 15:28, Steven Varga wrote:
>> Hi Tina,
>> Thank you for sharing. This matches my observations when I checked if 
>> slurm could do what I am upto: manage AWS EC2 dynamic(spot) instances.
>>
>> After replacing MySQL with REDIS now i wonder what would it take to make 
>> slurm node addition | removal dynamic. I've been looking at the source 
>> code for many months now and trying to decide if it can be done.
>>
>> I am using configless, 3 controllers, 2 slurmdbs with a redis sentinel 
>> based robust backend.
>>
>> Steven
>>
>>
>> On Thu., May 5, 2022, 08:57 Tina Friedrich, <tina.friedrich at it.ox.ac.uk 
>> <mailto:tina.friedrich at it.ox.ac.uk>> wrote:
>>
>>     Hi List,
>>
>>     out of curiosity - I would assume that if running configless, one
>>     doesn't manually need to restart slurmd on the nodes if the config 
>> changes?
>>
>>     Hi Steven,
>>
>>     I have no idea if you want to do it every couple of minutes and what 
>> the
>>     implications are of that (although I've certainly manage to restart 
>> them
>>     every 5 minutes by accident with no real problems caused), but -
>>     generally, restarting the daemons (slurmctld, slurmd) is a 
>> non-issue, as
>>     it's a safe operation. There's no risk to running jobs or anything. I
>>     have the config management restart them if any files change. It also
>>     doesn't seem to matter if the restarts of the controller & the node
>>     daemons are splayed a bit (i.e. don't happen at the same time), or what
>>     order they happen in.
>>
>>     Tina
>>
>>     On 05/05/2022 13:17, Steven Varga wrote:
>>      > Thank you for the quick reply! I know I am pushing my luck here: 
>> is it
>>      > possible to modify slurm: src/common/[read_conf.c, node_conf.c]
>>      > src/slurmctld/[read_config.c, ...] such that the state can be 
>> maintained
>>      > dynamically? -- or cheaper to write a job manager with less 
>> features but
>>      > supporting dynamic nodes from ground up?
>>      > best wishes: steve
>>      >
>>      > On Thu, May 5, 2022 at 12:29 AM Christopher Samuel 
>> <chris at csamuel.org <mailto:chris at csamuel.org>
>>      > <mailto:chris at csamuel.org <mailto:chris at csamuel.org>>> wrote:
>>      >
>>      >     On 5/4/22 7:26 pm, Steven Varga wrote:
>>      >
>>      >      > I am wondering what is the best way to update node 
>> changes, such as
>>      >      > addition and removal of nodes to SLURM. The excerpts below 
>> suggest a
>>      >      > full restart, can someone confirm this?
>>      >
>>      >     You are correct, you need to restart slurmctld and slurmd 
>> daemons at
>>      >     present.  See https://slurm.schedmd.com/faq.html#add_nodes 
>> <https://slurm.schedmd.com/faq.html#add_nodes>
>>      >     <https://slurm.schedmd.com/faq.html#add_nodes 
>> <https://slurm.schedmd.com/faq.html#add_nodes>>
>>      >
>>      >     All the best,
>>      >     Chris
>>      >     --
>>      >     Chris Samuel  : http://www.csamuel.org/ 
>> <http://www.csamuel.org/> <http://www.csamuel.org/ 
>> <http://www.csamuel.org/>>
>>      >     :  Berkeley, CA, USA
>>      >
>>
>>     --     Tina Friedrich, Advanced Research Computing Snr HPC Systems 
>> Administrator
>>
>>     Research Computing and Support Services
>>     IT Services, University of Oxford
>>     http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk> 
>> http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
>>



More information about the slurm-users mailing list