[slurm-users] Slurm version 21.08.7 is now available

Tim Wickberg tim at schedmd.com
Tue Apr 19 21:19:25 UTC 2022


We are pleased to announce the availability of Slurm version 21.08.7.

This includes a number of minor to moderate severity fixes that have 
accumulated since the last maintenance release was made two months ago.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 21.08.7
> ==========================
>  -- openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag.
>  -- Optimize sending down nodes in maintenance mode to the database when
>     removing reservations.
>  -- Avoid shrinking a reservation when overlapping with downed nodes.
>  -- Fix 'planned time' in rollups for jobs that were still pending when the
>     rollup happened.
>  -- Prevent new elements from a job array from causing rerollups.
>  -- Only check TRES limits against current usage for TRES requested by the job.
>  -- Do not allocate shared gres (MPS) in whole-node allocations
>  -- Fix minor memory leak when dealing with configless setups.
>  -- Constrain slurmstepd to job/step cgroup like in previous versions of Slurm.
>  -- Fix warnings on 32-bit compilers related to printf() formats.
>  -- Fix memory leak when freeing kill_job_msg_t.
>  -- Fix memory leak when using data_t.
>  -- Fix reconfigure issues after disabling/reenabling the GANG PreemptMode.
>  -- Fix race condition where a cgroup was being deleted while another step
>     was creating it.
>  -- Set the slurmd port correctly if multi-slurmd
>  -- openapi/v0.0.37 - Fix misspelling of account_gather_frequency in spec.
>  -- openapi/v0.0.37 - Fix misspelling of cluster_constraint in spec.
>  -- Fix FAIL mail not being sent if a job was cancelled due to preemption.
>  -- slurmrestd - move debug logs for HTTP handling to be gated by debugflag
>     NETWORK to avoid unnecessary logging of communication contents.
>  -- Fix issue with bad memory access when shrinking running steps.
>  -- Fix various issues with internal job accounting with GRES when jobs are
>     shrunk.
>  -- Fix ipmi polling on slurmd reconfig or restart.
>  -- Fix srun crash when reserved ports are being used and het step fails
>     to launch.
>  -- openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}.
>  -- slurmctld - Properly requeue all components of a het job if PrologSlurmctld
>     fails.
>  -- rlimits - remove final calls to limit nofiles to 4096 but to instead use
>     the max possible nofiles in slurmd and slurmdbd.
>  -- Fix slurmctld memory leak after a reconfigure with configless.
>  -- Fix slurmd memory leak when fetching configless files.
>  -- Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from state.
>  -- Fix minor memory leak with cleaning up the extern step.
>  -- Fix potential deadlock during slurmctld restart when there is a completing
>     job.
>  -- slurmstepd - reduce user requested soft rlimits when they are above max
>     hard rlimits to avoid rlimit request being completely ignored and
>     processes using default limits.
>  -- Fix memory leaks when job/step specifies a container.
>  -- Fix Slurm user commands displaying available features as active features
>     when no features were active.
>  -- Don't power down nodes that are rebooting.
>  -- Clear pending node reboot on power down request.
>  -- Ignore node registrations while node is powering down.
>  -- Don't reboot any node that is power<ing|ed> down.
>  -- Don't allow a node to reboot if it's marked for power down.
>  -- Fix issuing reboot and downing when rebooting a powering up node.
>  -- Clear DRAIN on node after failing to resume before ResumeTimeout.
>  -- Prevent repeating power down if node fails to resume before ResumeTimeout.
>  -- Fix federated cloud node communication with srun and cloud_dns.
>  -- Fix jobs being scheduled on nodes marked to be powered_down when idle.



More information about the slurm-users mailing list