We are pleased to announce the availability of Slurm 24.05.0.
To highlight some new features in 24.05:
- Isolated Job Step management. Enabled on a job-by-job basis with the
--stepmgr option, or globally through SlurmctldParameters=enable_stepmgr.
- Federation - Allow for client command operation while SlurmDBD is
unavailable.
- New MaxTRESRunMinsPerAccount and MaxTRESRunMinsPerUser QOS limits.
- New USER_DELETE reservation flag.
- New Flags=rebootless option on Features for node_features/helpers
which indicates the given feature can be enabled without rebooting the node.
- Cloud power management options: New "max_powered_nodes=<limit>" option
in SlurmctldParamters, and new SuspendExcNodes=<nodes>:<count> syntax
allowing for <count> nodes out of a given node list to be excluded.
- StdIn/StdOut/StdErr now stored in SlurmDBD accounting records for
batch jobs.
- New switch/nvidia_imex plugin for IMEX channel management on NVIDIA
systems.
- New RestrictedCoresPerGPU option at the Node level, designed to ensure
GPU workloads always have access to a certain number of CPUs even when
nodes are running non-GPU workloads concurrently.
The Slurm documentation has also been updated to the 24.05 release.
(Older versions can be found in the archive, linked from the main
documentation page.)
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
Slurm User Group (SLUG) 2024 is set for September 12-13 at the
University of Oslo in Oslo, Norway.
Registration information and a high-level schedule can be found
here:https://slug24.splashthat.com/ The last day to register at the
early bird pricing is this Friday, May 31st.
Friday is also the deadline to submit a presentation abstract. We do
not intend to extend this deadline.
If you are interested in presenting your own usage, developments, site
report, tutorial, etc about Slurm, please fill out the following
form:https://forms.gle/N7bFo5EzwuTuKkBN7
Notifications of final presentations accepted will go out by Friday, June 14th.
--
Victoria Hobson
SchedMD LLC
Vice President of Marketing
We are pleased to announce the availability of Slurm version 23.11.7.
The 23.11.7 release fixes a few potential crashes in slurmctld when
using less common options on job submission, slurmrestd compatibility
with auth/slurm, and some additional minor and moderate severity bugs.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
-Marshall
> -- slurmrestd - Correct OpenAPI specification for
> 'GET /slurm/v0.0.40/jobs/state' having response as null.
> -- Allow running jobs on overlapping partitions if jobs don't specify -s.
> -- Fix segfault when requesting a shared gres along with an exclusive
> allocation.
> -- Fix regression in 23.02 where afternotok and afterok dependencies were
> rejected for federated jobs not running on the origin cluster of the
> submitting job.
> -- slurmctld - Disable job table locking while job state cache is active when
> replying to `squeue --only-job-state` or `GET /slurm/v0.0.40/jobs/state`.
> -- Fix sanity check when setting tres-per-task on the job allocation as well as
> the step.
> -- slurmrestd - Fix compatiblity with auth/slurm.
> -- Fix issue where TRESRunMins gets off correct value if using
> QOS UsageFactor != 1.
> -- slurmrestd - Require `user` and `association_condition` fields to be
> populated for requests to 'POST /slurmdb/v0.0.40/users_association'.
> -- Avoid a slurmctld crash with extra_constraints enabled when a job requests
> certain invalid --extra values.
> -- `scancel --ctld` and `DELETE /slurm/v0.0/40/jobs` - Fix support for job
> array expressions (e.g. 1_[3-5]). Also fix signaling a single pending array
> task (e.g. 1_10), which previously signaled the whole array job instead.
> -- Fix a possible slurmctld segfault when at some point we failed to create an
> external launcher step.
> -- Allow the slurmctld to open a connection to the slurmdbd if the first
> attempt fails due to a protocol error.
> -- mpi/cray_shasta - Fix launch for non-het-steps within a hetjob.
> -- sacct - Fix "gpuutil" TRES usage output being incorrect when using --units.
> -- Fix a rare deadlock on slurmctld shutdown or reconfigure.
> -- Fix issue that only left one thread on each core available when "CPUs=" is
> configured to total thread count on multi-threaded hardware and no other
> topology info ("Sockets=", "CoresPerSocket", etc.) is configured.
> -- Fix the external launcher step not being allocated a VNI when requested.
> -- jobcomp/kafka - Fix payload length when producing and sending a message.
> -- scrun - Avoid a crash if RunTimeDelete is called before the container
> finishes.
> -- Save the slurmd's cred_state while reconfiguring to prevent the loss job
> credentials.
Slurm User Group (SLUG) 2024 is set for September 12-13 at the
University of Oslo in Oslo, Norway.
Registration information and a high-level schedule can be found
here:https://slug24.splashthat.com/
The deadline to submit a presentation abstract is Friday, May 31st. We
do not intend to extend this deadline.
If you are interested in presenting your own usage, developments, site
report, tutorial, etc about Slurm, please fill out the following
form:https://forms.gle/N7bFo5EzwuTuKkBN7
Notifications of final presentations accepted will go out by Friday, June 14th.
--
Victoria Hobson
SchedMD LLC
Vice President of Marketing
We are pleased to announce the availability of Slurm release candidate
24.05.0rc1.
To highlight some new features coming in 24.05:
- (Optional) isolated Job Step management. Enabled on a job-by-job basis
with the --stepmgr option, or globally through
SlurmctldParameters=enable_stepmgr.
- Federation - Allow for client command operation while SlurmDBD is
unavailable.
- New MaxTRESRunMinsPerAccount and MaxTRESRunMinsPerUser QOS limits.
- New USER_DELETE reservation flag.
- New Flags=rebootless option on Features for node_features/helpers
which indicates the given feature can be enabled without rebooting the node.
- Cloud power management options: New "max_powered_nodes=<limit>" option
in SlurmctldParamters, and new SuspendExcNodes=<nodes>:<count> syntax
allowing for <count> nodes out of a given node list to be excluded.
- StdIn/StdOut/StdErr now stored in SlurmDBD accounting records for
batch jobs.
- New switch/nvidia_imex plugin for IMEX channel management on NVIDIA
systems.
- New RestrictedCoresPerGPU option at the Node level, designed to ensure
GPU workloads always have access to a certain number of CPUs even when
nodes are running non-GPU workloads concurrently.
This is the first release candidate of the upcoming 24.05 release
series, and represents the end of development for this release, and a
finalization of the RPC and state file formats.
If any issues are identified with this release candidate, please report
them through https://bugs.schedmd.com against the 24.05.x version and we
will address them before the first production 24.05.0 release is made.
Please note that the release candidates are not intended for production use.
A preview of the updated documentation can be found at
https://slurm.schedmd.com/archive/slurm-master/ .
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support