Slurm versions 24.11.5, 24.05.8, and 23.11.11 are now available and
include a fix for a recently discovered security issue.
SchedMD customers were informed on April 23rd and provided a patch on
request; this process is documented in our security policy. [1]
A mistake with permission handling for Coordinators within Slurm's
accounting system can allow a Coordinator to promote a user to
Administrator. (CVE-2025-43904)
Thank you to Sekou Diakite (HPE) for reporting this.
Downloads are available at https://www.schedmd.com/downloads.php .
Release notes follow below.
- Tim
[1] https://www.schedmd.com/security-policy/
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 24.11.5
> ==========================
> -- Return error to scontrol reboot on bad nodelists.
> -- slurmrestd - Report an error when QOS resolution fails for v0.0.40
> endpoints.
> -- slurmrestd - Report an error when QOS resolution fails for v0.0.41
> endpoints.
> -- slurmrestd - Report an error when QOS resolution fails for v0.0.42
> endpoints.
> -- data_parser/v0.0.42 - Added +inline_enums flag which modifies the
> output when generating OpenAPI specification. It causes enum arrays to not
> be defined in their own schema with references ($ref) to them. Instead they
> will be dumped inline.
> -- Fix binding error with tres-bind map/mask on partial node allocations.
> -- Fix stepmgr enabled steps being able to request features.
> -- Reject step creation if requested feature is not available in job.
> -- slurmd - Restrict listening for new incoming RPC requests further into
> startup.
> -- slurmd - Avoid auth/slurm related hangs of CLI commands during startup
> and shutdown.
> -- slurmctld - Restrict processing new incoming RPC requests further into
> startup. Stop processing requests sooner during shutdown.
> -- slurmcltd - Avoid auth/slurm related hangs of CLI commands during
> startup and shutdown.
> -- slurmctld: Avoid race condition during shutdown or reconfigure that
> could result in a crash due delayed processing of a connection while
> plugins are unloaded.
> -- Fix small memleak when getting the job list from the database.
> -- Fix incorrect printing of % escape characters when printing stdio
> fields for jobs.
> -- Fix padding parsing when printing stdio fields for jobs.
> -- Fix printing %A array job id when expanding patterns.
> -- Fix reservations causing jobs to be held for Bad Constraints
> -- switch/hpe_slingshot - Prevent potential segfault on failed curl
> request to the fabric manager.
> -- Fix printing incorrect array job id when expanding stdio file names.
> The %A will now be substituted by the correct value.
> -- Fix printing incorrect array job id when expanding stdio file names.
> The %A will now be substituted by the correct value.
> -- switch/hpe_slingshot - Fix vni range not updating on slurmctld restart
> or reconfigre.
> -- Fix steps not being created when using certain combinations of -c and
> -n inferior to the jobs requested resources, when using stepmgr and nodes
> are configured with CPUs == Sockets*CoresPerSocket.
> -- Permit configuring the number of retry attempts to destroy CXI service
> via the new destroy_retries SwitchParameter.
> -- Do not reset memory.high and memory.swap.max in slurmd startup or
> reconfigure as we are never really touching this in slurmd.
> -- Fix reconfigure failure of slurmd when it has been started manually and
> the CoreSpecLimits have been removed from slurm.conf.
> -- Set or reset CoreSpec limits when slurmd is reconfigured and it was
> started with systemd.
> -- switch/hpe-slingshot - Make sure the slurmctld can free step VNIs after
> the controller restarts or reconfigures while the job is running.
> -- Fix backup slurmctld failure on 2nd takeover.
> -- Testsuite - fix python test 130_2.
> -- Fix security issue where a coordinator could add a user with elevated
> privileges. CVE-2025-43904.
> * Changes in Slurm 24.05.8
> ==========================
> -- Testsuite - fix python test 130_2.
> -- Fix security issue where a coordinator could add a user with elevated
> privileges. CVE-2025-43904.
> * Changes in Slurm 23.11.11
> ===========================
> -- Fixed a job requeuing issue that merged job entries into the same SLUID
> when all nodes in a job failed simultaneously.
> -- Add ABORT_ON_FATAL environment variable to capture a backtrace from any
> fatal() message.
> -- Testsuite - fix python test 130_2.
> -- Fix security issue where a coordinator could add a user with elevated
> privileges. CVE-2025-43904.