We are pleased to announce the availability of Slurm version 23.11.7.
The 23.11.7 release fixes a few potential crashes in slurmctld when using less common options on job submission, slurmrestd compatibility with auth/slurm, and some additional minor and moderate severity bugs.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
-Marshall
-- slurmrestd - Correct OpenAPI specification for 'GET /slurm/v0.0.40/jobs/state' having response as null. -- Allow running jobs on overlapping partitions if jobs don't specify -s. -- Fix segfault when requesting a shared gres along with an exclusive allocation. -- Fix regression in 23.02 where afternotok and afterok dependencies were rejected for federated jobs not running on the origin cluster of the submitting job. -- slurmctld - Disable job table locking while job state cache is active when replying to `squeue --only-job-state` or `GET /slurm/v0.0.40/jobs/state`. -- Fix sanity check when setting tres-per-task on the job allocation as well as the step. -- slurmrestd - Fix compatiblity with auth/slurm. -- Fix issue where TRESRunMins gets off correct value if using QOS UsageFactor != 1. -- slurmrestd - Require `user` and `association_condition` fields to be populated for requests to 'POST /slurmdb/v0.0.40/users_association'. -- Avoid a slurmctld crash with extra_constraints enabled when a job requests certain invalid --extra values. -- `scancel --ctld` and `DELETE /slurm/v0.0/40/jobs` - Fix support for job array expressions (e.g. 1_[3-5]). Also fix signaling a single pending array task (e.g. 1_10), which previously signaled the whole array job instead. -- Fix a possible slurmctld segfault when at some point we failed to create an external launcher step. -- Allow the slurmctld to open a connection to the slurmdbd if the first attempt fails due to a protocol error. -- mpi/cray_shasta - Fix launch for non-het-steps within a hetjob. -- sacct - Fix "gpuutil" TRES usage output being incorrect when using --units. -- Fix a rare deadlock on slurmctld shutdown or reconfigure. -- Fix issue that only left one thread on each core available when "CPUs=" is configured to total thread count on multi-threaded hardware and no other topology info ("Sockets=", "CoresPerSocket", etc.) is configured. -- Fix the external launcher step not being allocated a VNI when requested. -- jobcomp/kafka - Fix payload length when producing and sending a message. -- scrun - Avoid a crash if RunTimeDelete is called before the container finishes. -- Save the slurmd's cred_state while reconfiguring to prevent the loss job credentials.
slurm-announce@lists.schedmd.com