[slurm-users] Upgrade from 20.11.0 to Slurm version 22.05.6 ?
Michael Gutteridge
michael.gutteridge at gmail.com
Fri Nov 11 00:00:44 UTC 2022
Theoretically I think you should be able to. Slurm should upgrade from the
previous two releases (see this
<https://slurm.schedmd.com/quickstart_admin.html#upgrade:~:text=Slurm%20permits%20upgrades%20to%20a%20new%20major%20release%20from%20the%20past%20two%20major%20releases%2C>)
and I think that should include 20.11. (20.11 -> 21.08 -> 22.05). Not
something I've done though.
- Michael
On Thu, Nov 10, 2022 at 2:15 PM Sid Young <sid.young at gmail.com> wrote:
> Is there a direct upgrade path from 20.11.0 to 22.05.6 or is it in
> multiple steps?
>
>
> Sid Young
>
>
>
> On Fri, Nov 11, 2022 at 7:53 AM Marshall Garey <marshall at schedmd.com>
> wrote:
>
>> We are pleased to announce the availability of Slurm version 22.05.6.
>>
>> This includes a fix to core selection for steps which could result in
>> random task launch failures, alongside a number of other moderate
>> severity issues.
>>
>> - Marshall
>>
>> --
>> Marshall Garey
>> Release Management, Support, and Development
>> SchedMD LLC - Commercial Slurm Development and Support
>>
>> > * Changes in Slurm 22.05.6
>> > ==========================
>> > -- Fix a partition's DisableRootJobs=no from preventing root jobs from
>> working.
>> > -- Fix the number of allocated cpus for an auto-adjustment case in
>> which the
>> > job requests --ntasks-per-node and --mem (per-node) but the limit is
>> > MaxMemPerCPU.
>> > -- Fix POWER_DOWN_FORCE request leaving node in completing state.
>> > -- Do not count magnetic reservation queue records towards backfill
>> limits.
>> > -- Clarify error message when --send-libs=yes or
>> BcastParameters=send_libs
>> > fails to identify shared library files, and avoid creating an empty
>> > "<filename>_libs" directory on the target filesystem.
>> > -- Fix missing CoreSpec on dynamic nodes upon slurmctld restart.
>> > -- Fix node state reporting when using specialized cores.
>> > -- Fix number of CPUs allocated if --cpus-per-gpu used.
>> > -- Add flag ignore_prefer_validation to not validate --prefer on a job.
>> > -- Fix salloc/sbatch SLURM_TASKS_PER_NODE output environment variable
>> when the
>> > number of tasks is not requested.
>> > -- Permit using wildcard magic cookies with X11 forwarding.
>> > -- cgroup/v2 - Add check for swap when running OOM check after task
>> > termination.
>> > -- Fix deadlock caused by race condition when disabling power save
>> with a
>> > reconfigure.
>> > -- Fix memory leak in the dbd when container is sent to the database.
>> > -- openapi/dbv0.0.38 - correct dbv0.0.38_tres_info.
>> > -- Fix node SuspendTime, SuspendTimeout, ResumeTimeout being updated
>> after
>> > altering partition node lists with scontrol.
>> > -- jobcomp/elasticsearch - fix data_t memory leak after serialization.
>> > -- Fix issue where '*' wasn't accepted in gpu/cpu bind.
>> > -- Fix SLURM_GPUS_ON_NODE for shared GPU gres (MPS, shards).
>> > -- Add SLURM_SHARDS_ON_NODE environment variable for shards.
>> > -- Fix srun error with overcommit.
>> > -- Fix bug in core selection for the default cyclic distribution of
>> tasks
>> > across sockets, that resulted in random task launch failures.
>> > -- Fix core selection for steps requesting multiple tasks per core when
>> > allocation contains more cores than required for step.
>> > -- gpu/nvml - Fix MIG minor number generation when GPU minor number
>> > (/dev/nvidia[minor_number]) and index (as seen in nvidia-smi) do
>> not match.
>> > -- Fix accrue time underflow errors after slurmctld reconfig or
>> restart.
>> > -- Surpress errant errors from prolog_complete about being unable to
>> locate
>> > "node:(null)".
>> > -- Fix issue where shards were selected from multiple gpus and failed
>> to
>> > allocate.
>> > -- Fix step cpu count calculation when using --ntasks-per-gpu=.
>> > -- Fix overflow problems when validating array index parameters in
>> slurmctld
>> > and prevent a potential condition causing slurmctld to crash.
>> > -- Remove dependency on json-c in slurmctld when running with power
>> saving.
>> > Only the new "SLURM_RESUME_FILE" support relies on this, and it
>> will be
>> > disabled if json-c support is unavailable instead.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221110/04cd4ec7/attachment-0001.htm>
More information about the slurm-users
mailing list