[slurm-users] Slurm version 22.05.6 is now available
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Nov 11 07:07:29 UTC 2022
FYI: The Slurm download page is as usual:
https://www.schedmd.com/downloads.php
/Ole
On 11/10/22 22:49, Marshall Garey wrote:
> We are pleased to announce the availability of Slurm version 22.05.6.
>
> This includes a fix to core selection for steps which could result in
> random task launch failures, alongside a number of other moderate severity
> issues.
>
> - Marshall
>
> --
> Marshall Garey
> Release Management, Support, and Development
> SchedMD LLC - Commercial Slurm Development and Support
>
>> * Changes in Slurm 22.05.6
>> ==========================
>> -- Fix a partition's DisableRootJobs=no from preventing root jobs from
>> working.
>> -- Fix the number of allocated cpus for an auto-adjustment case in
>> which the
>> job requests --ntasks-per-node and --mem (per-node) but the limit is
>> MaxMemPerCPU.
>> -- Fix POWER_DOWN_FORCE request leaving node in completing state.
>> -- Do not count magnetic reservation queue records towards backfill
>> limits.
>> -- Clarify error message when --send-libs=yes or BcastParameters=send_libs
>> fails to identify shared library files, and avoid creating an empty
>> "<filename>_libs" directory on the target filesystem.
>> -- Fix missing CoreSpec on dynamic nodes upon slurmctld restart.
>> -- Fix node state reporting when using specialized cores.
>> -- Fix number of CPUs allocated if --cpus-per-gpu used.
>> -- Add flag ignore_prefer_validation to not validate --prefer on a job.
>> -- Fix salloc/sbatch SLURM_TASKS_PER_NODE output environment variable
>> when the
>> number of tasks is not requested.
>> -- Permit using wildcard magic cookies with X11 forwarding.
>> -- cgroup/v2 - Add check for swap when running OOM check after task
>> termination.
>> -- Fix deadlock caused by race condition when disabling power save with a
>> reconfigure.
>> -- Fix memory leak in the dbd when container is sent to the database.
>> -- openapi/dbv0.0.38 - correct dbv0.0.38_tres_info.
>> -- Fix node SuspendTime, SuspendTimeout, ResumeTimeout being updated after
>> altering partition node lists with scontrol.
>> -- jobcomp/elasticsearch - fix data_t memory leak after serialization.
>> -- Fix issue where '*' wasn't accepted in gpu/cpu bind.
>> -- Fix SLURM_GPUS_ON_NODE for shared GPU gres (MPS, shards).
>> -- Add SLURM_SHARDS_ON_NODE environment variable for shards.
>> -- Fix srun error with overcommit.
>> -- Fix bug in core selection for the default cyclic distribution of tasks
>> across sockets, that resulted in random task launch failures.
>> -- Fix core selection for steps requesting multiple tasks per core when
>> allocation contains more cores than required for step.
>> -- gpu/nvml - Fix MIG minor number generation when GPU minor number
>> (/dev/nvidia[minor_number]) and index (as seen in nvidia-smi) do not
>> match.
>> -- Fix accrue time underflow errors after slurmctld reconfig or restart.
>> -- Surpress errant errors from prolog_complete about being unable to
>> locate
>> "node:(null)".
>> -- Fix issue where shards were selected from multiple gpus and failed to
>> allocate.
>> -- Fix step cpu count calculation when using --ntasks-per-gpu=.
>> -- Fix overflow problems when validating array index parameters in
>> slurmctld
>> and prevent a potential condition causing slurmctld to crash.
>> -- Remove dependency on json-c in slurmctld when running with power
>> saving.
>> Only the new "SLURM_RESUME_FILE" support relies on this, and it will be
>> disabled if json-c support is unavailable instead.
More information about the slurm-users
mailing list