We are pleased to announce the availability of Slurm version 23.11.6.
The 23.11.6 release includes two different problems with the
priority/multifactor plugin: a crash and a miscalculation of
AssocGrpCPURunMinutes after a slurmctld reconfiguration/restart.
The wsrep_on errors that sites running MySQL or older MariaDB should
happen much less frequently and has a clarifying statement when it
is an innocuous error.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
-Marshall
> * Changes in Slurm 23.11.6
> ==========================
> -- Avoid limiting sockets per node to one when using gres enforce-binding.
> -- slurmrestd - Avoid permission denied errors when attempting to listen on
> the same port multiple times.
> -- Fix GRES reservations where the GRES has no topology
> (no cores= in gres.conf).
> -- Ensure that thread_id_rpc is gone before priority_g_fini().
> -- Fix scontrol reboot timeout removing drain state from nodes.
> -- squeue - Print header on empty reponse to `--only-job-state`.
> -- Fix slurmrestd not ending job properly when xauth is not present and a x11
> job is sent.
> -- Add experimental job state caching with
> SchedulerParameters=enable_job_state_cache to speed up querying job states
> with squeue --only-job-state.
> -- slurmrestd - Correct dumping of invalid ArrayJobIds returned from
> 'GET /slurm/v0.0.40/jobs/state'.
> -- squeue - Correct dumping of invalid ArrayJobIds returned from
> `squeue --only-job-state --{json|yaml}`.
> -- If scancel --ctld is not used with --interactive, --sibling, or specific
> step ids, then this option issues a single request to the slurmctld to
> signal all jobs matching the specified filters. This greatly improves
> the performance of slurmctld and scancel. The updated --ctld option also
> fixes issues with the --partition or --reservation scancel options for jobs
> that requested multiple partitions or reservations.
> -- slurmrestd - Give EINVAL error when failing to parse signal name to numeric
> signal.
> -- slurmrestd - Allow ContentBody for all methods per RFC7230 even if ignored.
> -- slurmrestd - Add 'DELETE /slurm/v0.0.40/jobs' endpoint to allow bulk job
> signaling via slurmctld.
> -- Fix combination of --nodelist and --exclude not always respecting the
> excluded node list.
> -- Fix jobs incorrectly allocating nodes exclusively when started on a
> partition that doesn't enforce it. This could happen if a multi-partition
> job doesn't specify --exclusive and is evaluated first on a partition
> configured with OverSubscribe=EXCLUSIVE but ends up starting in a partition
> configured with OverSubscribe!=EXCLUSIVE evaluated afterwards.
> -- Setting GLOB_SILENCE flag no longer exposes old bugged behavior.
> -- Fix associations AssocGrpCPURunMinutes being incorrectly computed for
> running jobs after a controller reconfiguration/restart.
> -- Fix scheduling jobs that request --gpus and nodes have different node
> weights and different numbers of gpus.
> -- slurmrestd - Add "NO_CRON_JOBS" as possible flag value to the following:
> 'DELETE /slurm/v0.0.40/jobs' flags field.
> 'DELETE /slurm/v0.0.40/job/{job_id}?flags=' flags query parameter.
> -- Fix scontrol segfault/assert failure if the TRESPerNode parameter is used
> when creating reservations.
> -- Avoid checking for wsrep_on when restoring streaming replication settings.
> -- Clarify in the logs that error "1193 Unknown system variable 'wsrep_on'" is
> innocuous.
> -- accounting_storage/mysql - Fix problem when loading reservations from an
> archive dump.
> -- slurmdbd - Fix minor race condition when sending updates to a shutdown
> slurmctld.
> -- slurmctld - Fix invalid refusal of a reservation update.
> -- openapi - Fix memory leak of /meta/slurm/cluster response field.
> -- Fix memory leak when using auth/slurm and AuthInfo=use_client_ids.
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
Slurm User Group (SLUG) 2024 is set for September 12-13 at the
University of Oslo in Oslo, Norway.
Registration information and a high-level schedule can be found here:
https://slug24.splashthat.com/
We invite all interested attendees to submit a presentation abstract
to be given at SLUG. Presentation content can be in the form of a
tutorial, technical presentation or site report.
SLUG 2024 is sponsored and organized by the University of Oslo and
SchedMD. This international event is open to those who want to:
Learn more about Slurm, a highly scalable resource manager and job scheduler
- Share their knowledge and experience with other users and administrators
- Get detailed information about the latest features and developments
- Share requirements and discuss future developments
Everyone who wants to present their own usage, developments, site
report, or tutorial about Slurm is invited to submit abstract details
here: https://forms.gle/N7bFo5EzwuTuKkBN7
Abstracts are due Friday, May 31st and notifications of acceptance
will go out by Friday, June 14th.
--
Victoria Hobson
SchedMD LLC
Vice President of Marketing