Recommended Stable Slurm Version for >100P Scale Clusters - slurm-users

16 Nov 2025

We are currently planning to deploy a new HPC system with a total
compute capacity exceeding 100 PF. As part of our preparation, we
would like to understand which Slurm versions are considered
stable and widely used at this scale.

Could you please share your recommendations or experience regarding:

1. Which Slurm version is currently running reliably on very
large-scale clusters (>100 PF or >10k nodes)?

2. Whether there are any versions we should avoid due to known
issues at large scale.

3. Any best practices or configuration considerations for Slurm
deployments of this size.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com