[slurm-users] SLURM: reconfig

Ward Poelmans ward.poelmans at vub.be
Thu May 5 13:53:31 UTC 2022


Hi Steven,

I think truly dynamic adding and removing of nodes is something that's on the roadmap for slurm 23.02?

Ward

On 5/05/2022 15:28, Steven Varga wrote:
> Hi Tina,
> Thank you for sharing. This matches my observations when I checked if slurm could do what I am upto: manage AWS EC2 dynamic(spot) instances.
> 
> After replacing MySQL with REDIS now i wonder what would it take to make slurm node addition | removal dynamic. I've been looking at the source code for many months now and trying to decide if it can be done.
> 
> I am using configless, 3 controllers, 2 slurmdbs with a redis sentinel based robust backend.
> 
> Steven
> 
> 
> On Thu., May 5, 2022, 08:57 Tina Friedrich, <tina.friedrich at it.ox.ac.uk <mailto:tina.friedrich at it.ox.ac.uk>> wrote:
> 
>     Hi List,
> 
>     out of curiosity - I would assume that if running configless, one
>     doesn't manually need to restart slurmd on the nodes if the config changes?
> 
>     Hi Steven,
> 
>     I have no idea if you want to do it every couple of minutes and what the
>     implications are of that (although I've certainly manage to restart them
>     every 5 minutes by accident with no real problems caused), but -
>     generally, restarting the daemons (slurmctld, slurmd) is a non-issue, as
>     it's a safe operation. There's no risk to running jobs or anything. I
>     have the config management restart them if any files change. It also
>     doesn't seem to matter if the restarts of the controller & the node
>     daemons are splayed a bit (i.e. don't happen at the same time), or what
>     order they happen in.
> 
>     Tina
> 
>     On 05/05/2022 13:17, Steven Varga wrote:
>      > Thank you for the quick reply! I know I am pushing my luck here: is it
>      > possible to modify slurm: src/common/[read_conf.c, node_conf.c]
>      > src/slurmctld/[read_config.c, ...] such that the state can be maintained
>      > dynamically? -- or cheaper to write a job manager with less features but
>      > supporting dynamic nodes from ground up?
>      > best wishes: steve
>      >
>      > On Thu, May 5, 2022 at 12:29 AM Christopher Samuel <chris at csamuel.org <mailto:chris at csamuel.org>
>      > <mailto:chris at csamuel.org <mailto:chris at csamuel.org>>> wrote:
>      >
>      >     On 5/4/22 7:26 pm, Steven Varga wrote:
>      >
>      >      > I am wondering what is the best way to update node changes, such as
>      >      > addition and removal of nodes to SLURM. The excerpts below suggest a
>      >      > full restart, can someone confirm this?
>      >
>      >     You are correct, you need to restart slurmctld and slurmd daemons at
>      >     present.  See https://slurm.schedmd.com/faq.html#add_nodes <https://slurm.schedmd.com/faq.html#add_nodes>
>      >     <https://slurm.schedmd.com/faq.html#add_nodes <https://slurm.schedmd.com/faq.html#add_nodes>>
>      >
>      >     All the best,
>      >     Chris
>      >     --
>      >     Chris Samuel  : http://www.csamuel.org/ <http://www.csamuel.org/> <http://www.csamuel.org/ <http://www.csamuel.org/>>
>      >     :  Berkeley, CA, USA
>      >
> 
>     -- 
>     Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
> 
>     Research Computing and Support Services
>     IT Services, University of Oxford
>     http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk> http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4716 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220505/7a37a876/attachment.bin>


More information about the slurm-users mailing list