[slurm-users] Upgrade from 20.11.0 to Slurm version 22.05.6 ?

Michael Gutteridge michael.gutteridge at gmail.com
Fri Nov 11 00:00:44 UTC 2022


Theoretically I think you should be able to.  Slurm should upgrade from the
previous two releases (see this
<https://slurm.schedmd.com/quickstart_admin.html#upgrade:~:text=Slurm%20permits%20upgrades%20to%20a%20new%20major%20release%20from%20the%20past%20two%20major%20releases%2C>)
and I think that should include 20.11. (20.11 -> 21.08 -> 22.05).  Not
something I've done though.

 - Michael


On Thu, Nov 10, 2022 at 2:15 PM Sid Young <sid.young at gmail.com> wrote:

> Is there a direct upgrade path from  20.11.0 to 22.05.6 or is it in
> multiple steps?
>
>
> Sid Young
>
>
>
> On Fri, Nov 11, 2022 at 7:53 AM Marshall Garey <marshall at schedmd.com>
> wrote:
>
>> We are pleased to announce the availability of Slurm version 22.05.6.
>>
>> This includes a fix to core selection for steps which could result in
>> random task launch failures, alongside a number of other moderate
>> severity issues.
>>
>> - Marshall
>>
>> --
>> Marshall Garey
>> Release Management, Support, and Development
>> SchedMD LLC - Commercial Slurm Development and Support
>>
>> > * Changes in Slurm 22.05.6
>> > ==========================
>> >  -- Fix a partition's DisableRootJobs=no from preventing root jobs from
>> working.
>> >  -- Fix the number of allocated cpus for an auto-adjustment case in
>> which the
>> >     job requests --ntasks-per-node and --mem (per-node) but the limit is
>> >     MaxMemPerCPU.
>> >  -- Fix POWER_DOWN_FORCE request leaving node in completing state.
>> >  -- Do not count magnetic reservation queue records towards backfill
>> limits.
>> >  -- Clarify error message when --send-libs=yes or
>> BcastParameters=send_libs
>> >     fails to identify shared library files, and avoid creating an empty
>> >     "<filename>_libs" directory on the target filesystem.
>> >  -- Fix missing CoreSpec on dynamic nodes upon slurmctld restart.
>> >  -- Fix node state reporting when using specialized cores.
>> >  -- Fix number of CPUs allocated if --cpus-per-gpu used.
>> >  -- Add flag ignore_prefer_validation to not validate --prefer on a job.
>> >  -- Fix salloc/sbatch SLURM_TASKS_PER_NODE output environment variable
>> when the
>> >     number of tasks is not requested.
>> >  -- Permit using wildcard magic cookies with X11 forwarding.
>> >  -- cgroup/v2 - Add check for swap when running OOM check after task
>> >     termination.
>> >  -- Fix deadlock caused by race condition when disabling power save
>> with a
>> >     reconfigure.
>> >  -- Fix memory leak in the dbd when container is sent to the database.
>> >  -- openapi/dbv0.0.38 - correct dbv0.0.38_tres_info.
>> >  -- Fix node SuspendTime, SuspendTimeout, ResumeTimeout being updated
>> after
>> >     altering partition node lists with scontrol.
>> >  -- jobcomp/elasticsearch - fix data_t memory leak after serialization.
>> >  -- Fix issue where '*' wasn't accepted in gpu/cpu bind.
>> >  -- Fix SLURM_GPUS_ON_NODE for shared GPU gres (MPS, shards).
>> >  -- Add SLURM_SHARDS_ON_NODE environment variable for shards.
>> >  -- Fix srun error with overcommit.
>> >  -- Fix bug in core selection for the default cyclic distribution of
>> tasks
>> >     across sockets, that resulted in random task launch failures.
>> >  -- Fix core selection for steps requesting multiple tasks per core when
>> >     allocation contains more cores than required for step.
>> >  -- gpu/nvml - Fix MIG minor number generation when GPU minor number
>> >     (/dev/nvidia[minor_number]) and index (as seen in nvidia-smi) do
>> not match.
>> >  -- Fix accrue time underflow errors after slurmctld reconfig or
>> restart.
>> >  -- Surpress errant errors from prolog_complete about being unable to
>> locate
>> >     "node:(null)".
>> >  -- Fix issue where shards were selected from multiple gpus and failed
>> to
>> >     allocate.
>> >  -- Fix step cpu count calculation when using --ntasks-per-gpu=.
>> >  -- Fix overflow problems when validating array index parameters in
>> slurmctld
>> >     and prevent a potential condition causing slurmctld to crash.
>> >  -- Remove dependency on json-c in slurmctld when running with power
>> saving.
>> >     Only the new "SLURM_RESUME_FILE" support relies on this, and it
>> will be
>> >     disabled if json-c support is unavailable instead.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221110/04cd4ec7/attachment-0001.htm>


More information about the slurm-users mailing list