With the Slurm 25.11 release out, and the 1.0.0rc1 for the Slinky
Project done, we're quickly shifting into conference season.
SchedMD staff are presenting at KubeCon North America on Slinky [1] this
week. We don't have a booth here, but feel free to say hi if you see any
of us, or send me a message on the Kubernetes or CNCF Slack channels
(@wickberg) if you have something you'd like to discuss in person.
Next week we'll be manning the Slurm Booth at SC25 [2], as well as
hosting the annual Slurm Community Birds-of-a-Feather session [3] on
Thursday from 12:15-1:15pm.
For this year we wanted to to make the Slurm Community BoF survey
available ahead of time, and to open it to a wider audience. This'll let
us prepare some initial results ahead of the BoF (while still trying to
update live data during the BoF), and the results from this are
invaluable as we plan for future Slurm releases.
The survey is available now: https://schedmd.com/survey
For those not at SC25, we'll have a brief set of highlights from this
survey included in the Slurm 25.11 release overview video on our YouTube
channel in December.
- Tim
[1] https://kccncna2025.sched.com/event/27FW5/
[2] The Slurm Booth is #1641. We have a new halo banner this year as well.
[3] https://sc25.conference-program.com/presentation/?id=bof101&sess=sess471
[4] https://www.youtube.com/SchedMDSlurm
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
We are pleased to announce the availability of Slinky version 1.0.0-rc1.
Slinky is SchedMD’s set of components to integrate Slurm in Kubernetes
environments. Slinky consists of two main projects, slurm-operator and
slurm-bridge. Our landing page is here:
https://www.slinky.ai
The slurm-operator handles cases where users wish to run Slurm jobs
within the Kubernetes cluster.
Release v1.0.0-rc1:
https://github.com/SlinkyProject/slurm-operator/tree/release-1.0
New features include hybrid support, integration points for external
tooling, and workload protection and isolation.
The full changelog may be found here:
https://github.com/SlinkyProject/slurm-operator/blob/main/CHANGELOG/CHANGEL…
The slurm-bridge handles the cases where you want Slurm scheduling on
your cluster and to be able to run either Kubernetes or Slurm jobs.
Release v1.0.0-rc1:
https://github.com/SlinkyProject/slurm-bridge/tree/release-1.0
New features include support for DRA Extended Resources, support for
TaintToleration and VolumeBinding plugins, and integration of new Slurm
25.11 support for granular node resource allocation assignments with GRES.
The full changelog may be found here:
https://github.com/SlinkyProject/slurm-bridge/blob/main/CHANGELOG/CHANGELOG…
The SlinkyProject registry now has containers that support both amd64
(x86_64) and arm64 (aarch64) architectures. You may find these here:
https://github.com/orgs/SlinkyProject/packages
--
Marlow Warnicke
Principal Cloud Engineer, SchedMD LLC
Commercial Slurm - and Slinky - Development and Support
We are pleased to announce the availability of Slurm release candidate
25.11.0rc1.
To highlight some new features coming in 25.11:
* Added new "Expedited Requeue" mode for batch jobs. Batch jobs with
--requeue=expedite will automatically requeue on node failure, or if the
batch script returns a non-zero exit code and one or more Epilog scripts
fail. Expedited requeue jobs are eligible to restart immediately, are
treated as the highest priority job in the system, and their previously
allocated set of nodes will be prevented from launching other work.
* Added a new "Mode 3" of operation to Hierarchical Resources. This mode
complements the existing Mode 1 and Mode 2 by summing usage from lower
levels automatically. This can be used, e.g., to implement a
power-capping mode modeling power distribution between the datacenter,
local distribution, and individual racks.
* Added direct support for exporting OpenMetrics (Prometheus) telemetry
from slurmctld. This is accessible on SlurmctldPort on SlurmctldHost by
default, or can be disabled if desired.
* Added an experimental asynchronous-reply mode to slurmctld. If enabled
with "SlurmctldParameters=enable_async_reply", RPC responses are
offloaded to the kernel for further processing, freeing individual
worker threads for new traffic.
This is the first release candidate of the upcoming 25.11 release
series, and represents the end of development for this release, and a
finalization of the RPC and state file formats.
If any issues are identified with this release candidate, please report
them through https://bugs.schedmd.com against the 25.11.x version and we
will address them before the first production 25.11.0 release is made.
Please note that the release candidates are not intended for production use.
A preview of the updated documentation can be found at
https://slurm.schedmd.com/archive/slurm-master/ .
Slurm can be downloaded from https://www.schedmd.com/download-slurm/.
The changelog for 25.11 can be found here:
https://github.com/SchedMD/slurm/blob/master/CHANGELOG/slurm-25.11.md
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
We'll have a bit more details as conference season quickly approaches
this November, but SchedMD staff are presenting at KubeCon NA on Slinky
[1]. We'll be manning the Slurm Booth at SC25 [2], as well as hosting
the annual Slurm Community Birds-of-a-Feather session [3].
I'll also send a link out to the survey questions for the BoF to the
slurm-users list ahead of the conference, and we'll be going into a bit
more depth on the answers during the BoF this year.
The events page on the SchedMD website has more detail on future events
as well: https://www.schedmd.com/events/
A few folks had asked, and apparently we never mentioned this more
publicly, but: SchedMD does not plan to hold an in-person SLUG in 2025
or 2026. We are working to bring some of the same content to our YouTube
channel [4] as a way to more broadly disseminate some of that same
content, starting with the Slurm 25.11 release overview in December.
- Tim
[1] https://kccncna2025.sched.com/event/27FW5/
[2] The Slurm Booth is #1641.
[3] https://sc25.conference-program.com/presentation/?id=bof101&sess=sess471
[4] https://www.youtube.com/SchedMDSlurm
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
We are pleased to announce the availability of Slurm version 25.05.4.
This version increases the default number of maximum connections to
slurmctld from 50 to 512, fixes a regression added in 25.05.2 that broke
compatibility with PMIx v2.x through v3.1.0rc1, and fixes other minor to
moderate bugs.
The full list of changes are available in the CHANGELOG:
https://github.com/SchedMD/slurm/blob/slurm-25.05/CHANGELOG/slurm-25.05.md
Slurm can be downloaded from:
https://www.schedmd.com/download-slurm/
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
We are pleased to announce the availability of Slurm version 25.05.3.
This version fixes an issue that prevented deleting a QOS when running
with MySQL servers (MariaDB is was unaffected). Please note that the
slurmdbd will require MySQL 8.0.4+ or MariaDB 10.0.5+ to function
correctly. This version also fixes heterogeneous jobs when TLS is
enabled, a logging issue with syslog, and various mild to moderate
stability fixes.
The full list of changes are available in the CHANGELOG:
https://github.com/SchedMD/slurm/blob/slurm-25.05/CHANGELOG/slurm-25.05.md
Slurm can be downloaded from:
https://www.schedmd.com/download-slurm/
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
We are pleased to announce the availability of Slurm version 25.05.2.
This version fixes a few regressions with x11 forwarding in 25.05 that
may prevent applications from launching, adds support for PMIx v6.x,
fixes a variety of stability issues, fixes a regression where
--tres-per-task was ignored, and fixes additional minor to moderate
severity issues.
The full list of changes are available in the CHANGELOG:
https://github.com/SchedMD/slurm/blob/slurm-25.05/CHANGELOG/slurm-25.05.md
Slurm can be downloaded from:
https://www.schedmd.com/download-slurm/
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
We are pleased to announce the availability of Slurm versions 25.05.1
and 24.11.6.
Changes in 25.05 include the following:
* Fix many issues with the TLS Certificate Manager introduced in 25.05
* Optimize account deletion
* Fix a bug when reordering the association hierarchy
* Fix some issues that cause daemon crashes
* Fix a variety of memory leaks
Changes in 24.11 include the following:
* Fix some issues that cause daemons to crash
* Fix some race conditions on shutdown that cause daemons to crash or hang
The full list of changes are available in the CHANGELOG for each version:
https://github.com/SchedMD/slurm/blob/slurm-25.05/CHANGELOG/slurm-25.05.mdhttps://github.com/SchedMD/slurm/blob/slurm-24.11/CHANGELOG/slurm-24.11.md
Slurm can be downloaded from:
https://www.schedmd.com/download-slurm/
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support