[slurm-users] SLURM: reconfig

Steven Varga steven.varga at gmail.com
Thu May 5 13:28:37 UTC 2022


Hi Tina,
Thank you for sharing. This matches my observations when I checked if slurm
could do what I am upto: manage AWS EC2 dynamic(spot) instances.

After replacing MySQL with REDIS now i wonder what would it take to make
slurm node addition | removal dynamic. I've been looking at the source code
for many months now and trying to decide if it can be done.

I am using configless, 3 controllers, 2 slurmdbs with a redis sentinel
based robust backend.

Steven


On Thu., May 5, 2022, 08:57 Tina Friedrich, <tina.friedrich at it.ox.ac.uk>
wrote:

> Hi List,
>
> out of curiosity - I would assume that if running configless, one
> doesn't manually need to restart slurmd on the nodes if the config changes?
>
> Hi Steven,
>
> I have no idea if you want to do it every couple of minutes and what the
> implications are of that (although I've certainly manage to restart them
> every 5 minutes by accident with no real problems caused), but -
> generally, restarting the daemons (slurmctld, slurmd) is a non-issue, as
> it's a safe operation. There's no risk to running jobs or anything. I
> have the config management restart them if any files change. It also
> doesn't seem to matter if the restarts of the controller & the node
> daemons are splayed a bit (i.e. don't happen at the same time), or what
> order they happen in.
>
> Tina
>
> On 05/05/2022 13:17, Steven Varga wrote:
> > Thank you for the quick reply! I know I am pushing my luck here: is it
> > possible to modify slurm: src/common/[read_conf.c, node_conf.c]
> > src/slurmctld/[read_config.c, ...] such that the state can be maintained
> > dynamically? -- or cheaper to write a job manager with less features but
> > supporting dynamic nodes from ground up?
> > best wishes: steve
> >
> > On Thu, May 5, 2022 at 12:29 AM Christopher Samuel <chris at csamuel.org
> > <mailto:chris at csamuel.org>> wrote:
> >
> >     On 5/4/22 7:26 pm, Steven Varga wrote:
> >
> >      > I am wondering what is the best way to update node changes, such
> as
> >      > addition and removal of nodes to SLURM. The excerpts below
> suggest a
> >      > full restart, can someone confirm this?
> >
> >     You are correct, you need to restart slurmctld and slurmd daemons at
> >     present.  See https://slurm.schedmd.com/faq.html#add_nodes
> >     <https://slurm.schedmd.com/faq.html#add_nodes>
> >
> >     All the best,
> >     Chris
> >     --
> >     Chris Samuel  : http://www.csamuel.org/ <http://www.csamuel.org/>
> >     :  Berkeley, CA, USA
> >
>
> --
> Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
>
> Research Computing and Support Services
> IT Services, University of Oxford
> http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220505/e8de3b57/attachment-0001.htm>


More information about the slurm-users mailing list