We are pleased to announce the availability of Slurm versions 24.05.3 and 23.11.10.
Version 24.05.3 fixes a potential database problem when deleting a qos. This bug only existed in 24.05.
Both versions have fixes for jobs potentially being stuck when using cloud nodes when some nodes are powered down, a regression in 23.11.9 and 24.05.2 that caused sattach to crash, and some other minor issues.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
-- Marshall Garey Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support
- Changes in Slurm 24.05.3
========================== -- data_parser/v0.0.40 - Added field descriptions -- slurmrestd - Avoid creating new slurmdbd connection per request to '* /slurm/slurmctld/*/*' endpoints. -- Fix compilation issue with switch/hpe_slingshot plugin. -- Fix gres per task allocation with threads-per-core. -- data_parser/v0.0.41 - Added field descriptions -- slurmrestd - Change back generated OpenAPI schema for `DELETE /slurm/v0.0.40/jobs/` to RequestBody instead of using parameters for request. slurmrestd will continue accept endpoint requests via RequestBody or HTTP query. -- topology/tree - Fix issues with switch distance optimization. -- Fix potential segfault of secondary slurmctld when falling back to the primary when running with a JobComp plugin. -- Enable --json/--yaml=v0.0.39 options on client commands to dump data using data_parser/v0.0.39 instead or outputting nothing. -- switch/hpe_slingshot - Fix issue that could result in a 0 length state file. -- Fix unnecessary message protocol downgrade for unregistered nodes. -- Fix unnecessarily packing alias addrs when terminating jobs with a mix of non-cloud/dynamic nodes and powered down cloud/dynamic nodes. -- accounting_storage/mysql - Fix issue when deleting a qos that could remove too many commas from the qos and/or delta_qos fields of the assoc table. -- slurmctld - Fix memory leak when using RestrictedCoresPerGPU. -- Fix allowing access to reservations without MaxStartDelay set. -- Fix regression introduced in 24.05.0rc1 breaking srun --send-libs parsing. -- Fix slurmd vsize memory leak when using job submission/allocation commands that implicitly or explicitly use --get-user-env. -- slurmd - Fix node going into invalid state when using CPUSpecList and setting CPUs to the # of cores on a multithreaded node -- Fix reboot asap nodes being considered in backfill after a restart. -- Fix --clusters/-M queries for clusters outside of a federation when fed_display is configured. -- Fix scontrol allowing updating job with bad cpus-per-task value. -- sattach - Fix regression from 24.05.2 security fix leading to crash. -- mpi/pmix - Fix assertion when built under --enable-debug.
- Changes in Slurm 23.11.10
=========================== -- switch/hpe_slingshot - Fix issue that could result in a 0 length state file. -- Fix unnecessary message protocol downgrade for unregistered nodes. -- Fix unnecessarily packing alias addrs when terminating jobs with a mix of non-cloud/dynamic nodes and powered down cloud/dynamic nodes. -- Fix allowing access to reservations without MaxStartDelay set. -- Fix scontrol allowing updating job with bad cpus-per-task value. -- sattach - Fix regression from 23.11.9 security fix leading to crash.
slurm-announce@lists.schedmd.com