[slurm-users] Slurm version 21.08.6 is now available

Tim Wickberg tim at schedmd.com
Thu Feb 24 19:30:01 UTC 2022


We are pleased to announce the availability of Slurm version 21.08.6.

This includes a number of fixes since the last maintenance release was 
made in December, including an import fix to a regression seen when 
using the 'mpirun' command within a job script.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 21.08.6
> ==========================
>  -- Handle typed shared GRES better in accounting.
>  -- Fix plugin_name definitions in a number of plugins to improve logging.
>  -- Close sbcast file transfers when job is cancelled.
>  -- job_submit/lua - allow mail_type and mail_user fields to be modified.
>  -- scrontab - fix handling of --gpus and --ntasks-per-gpu options.
>  -- sched/backfill - fix job_queue_rec_t memory leak.
>  -- Fix magnetic reservation logic in both main and backfill schedulers.
>  -- job_container/tmpfs - fix memory leak when using InitScript.
>  -- slurmrestd / openapi - fix memory leaks.
>  -- Fix slurmctld segfault due to job array resv_list double free.
>  -- Fix multi-reservation job testing logic.
>  -- Fix slurmctld segfault due to insufficient job reservation parse validation.
>  -- Fix main and backfill schedulers handling for already rejected job array.
>  -- sched/backfill - restore resv_ptr after yielding locks.
>  -- acct_gather_energy/xcc - appropriately close and destroy the IPMI context.
>  -- Protect slurmstepd from making multiple calls to the cleanup logic.
>  -- Prevent slurmstepd segfault at cleanup time in mpi_fini().
>  -- Fix slurmctld sometimes hanging if shutdown while PrologSlurmctld or
>     EpilogSlurmctld were running and PrologEpilogTimeout is set in slurm.conf.
>  -- Fix affinity of the batch step if batch host is different than the first
>     node in the allocation.
>  -- slurmdbd - fix segfault after multiple failover/failback operations.
>  -- Fix jobcomp filetxt job selection condition.
>  -- Fix -f flag of sacct not being used.
>  -- Select cores for job steps according to the socket distribution. Previously,
>     sockets were always filled before selecting cores from the next socket.
>  -- Keep node in Future state if epilog completes while in Future state.
>  -- Fix erroneous --constraint behavior by preventing multiple sets of brackets.
>  -- Make ResetAccrueTime update the job's accrue_time to now.
>  -- Fix sattach initialization with configless mode.
>  -- Revert packing limit checks affecting pmi2.
>  -- sacct - fixed assertion failure when using -c option and a federation
>     display
>  -- Fix issue that allowed steps to overallocate the job's memory.
>  -- Fix the sanity check mode of AutoDetect so that it actually works.
>  -- Fix deallocated nodes that didn't actually launch a job from waiting for
>     Epilogslurmctld to complete before clearing completing node's state.
>  -- Job should be in a completing state if EpilogSlurmctld when being requeued.
>  -- Fix job not being requeued properly if all node epilog's completed before
>     EpilogSlurmctld finished.
>  -- Keep job completing until EpilogSlurmctld is completed even when "downing"
>     a node.
>  -- Fix handling reboot with multiple job features.
>  -- Fix nodes getting powered down when creating new partitions.
>  -- Fix bad bit_realloc which potentially could lead to bad memory access.
>  -- slurmctld - remove limit on the number of open files.
>  -- Fix bug where job_state file of size above 2GB wasn't saved without any
>     error message.
>  -- Fix various issues with no_consume gres.
>  -- Fix regression in 21.08.0rc1 where job steps failed to launch on systems
>     that reserved a CPU in a cgroup outside of Slurm (for example, on systems
>     with WekaIO).
>  -- Fix OverTimeLimit not being reset on scontrol reconfigure when it is
>     removed from slurm.conf.
>  -- serializer/yaml - use dynamic buffer to allow creation of YAML outputs
>     larger than 1MiB.
>  -- Fix minor memory leak affecting openapi users at process termination.
>  -- Fix batch jobs not resolving the username when nss_slurm is enabled.
>  -- slurmrestd - Avoid slurmrestd ignoring invalid HTTP method if the response
>     serialized without error.
>  -- openapi/dbv0.0.37 - Correct conditional that caused the diag output to
>     give an internal server error status on success.
>  -- Make --mem-bind=sort work with task_affinity
>  -- Fix sacctmgr to set MaxJobsAccruePer{User|Account} and MinPrioThres in
>     sacctmgr add qos, modify already worked correctly.
>  -- job_container/tmpfs - avoid printing extraneous error messages in Prolog
>     and Epilog, and when the job completes.
>  -- Fix step CPU memory allocation with --threads-per-core without --exact.
>  -- Remove implicit --exact when --threads-per-core or --hint=nomultithread
>     is used.
>  -- Do not allow a step to request more threads per core than the
>     allocation did.
>  -- Remove implicit --exact when --cpus-per-task is used.



More information about the slurm-users mailing list