[slurm-users] Slurm version 17.11.6 available

Tim Wickberg tim at schedmd.com
Wed May 9 14:24:11 MDT 2018

We are pleased to announce the availability of Slurm version 17.11.6.

This includes over 50 fixes made since 17.11.5 was released eight weeks 
ago, including a race condition within the slurmstepd that can lead to 
hung extern steps.

Slurm can be downloaded from https://www.schedmd.com/downloads.php

- Tim

> * Changes in Slurm 17.11.6
> ==========================
>  -- CRAY - Add slurmsmwd to the contribs/cray dir.
>  -- sview - fix crash when closing any search dialog.
>  -- Fix initialization of variable in stepd when using native x11.
>  -- Fix reading slurm_io_init_msg to handle partial messages.
>  -- Fix scontrol create res segfault when wrong user/account parameters given.
>  -- Fix documentation for sacct on parameter -X (--allocations)
>  -- Change TRES Weights debug messages to debug3.
>  -- FreeBSD - assorted fixes to restore build.
>  -- Fix for not tracking environment variables from unrelated different jobs.
>  -- PMIX - Added the direct connect authentication.
>     When upgrading this may cause issues with jobs using pmix starting on mixed
>     slurmstepd versions where some are less than 17.11.6.
>  -- Prevent the backup slurmctld from losing the active/available node
>     features list on takeover.
>  -- Add documentation for fix IDLE*+POWER due to capmc stuck in Cray systems.
>  -- Fix missing mutex unlock when prolog is failing on a node, leading to a
>     hung slurmd.
>  -- Fix locking around Cray CCM prolog/epilog.
>  -- Add missing fed_mgr read locks.
>  -- Fix issue incorrectly setting a job time_start to 0 while requeueing.
>  -- smail - remove stray '-s' from mail subject line.
>  -- srun - prevent segfault if ClusterName setting is unset but
>     SLURM_WORKING_CLUSTER environment variable is defined.
>  -- In configurator.html web pages change default configuration from
>     task/none to task/affinity plugin and from select/linear plugin to
>     select/cons_res plus CR_Core.
>  -- Allow jobs to run beyond a FLEX reservation end time.
>  -- Fix problem with wrongly set as Reservation job state_reason.
>  -- Prevent bit_ffs() from returnig value out of bitmap range.
>  -- Improve performance of 'squeue -u' when PrivateData=jobs is enabled.
>  -- Make UnavailableNodes value in job reason be correct for each job.
>  -- Fix 'squeue -o %s' on Cray systems.
>  -- Fix incorrect error thrown when cancelling part of a job array.
>  -- Fix error code and scheduling problem for --exclusive=[user|mcs].
>  -- Fix build when lz4 is in a non-standard location.
>  -- Be able to force power_down of cloud node even if in power_save state.
>  -- Allow cloud nodes to be recognized in Slurm when booted out of band.
>  -- Fixes race condition in _pack_job_gres() when is called multiple times.
>  -- Increase duration of "sleep" command used to keep extern step alive.
>  -- Remove unsafe usage of pthread_cancel in slurmstepd that can lead to
>     to deadlock in glibc.
>  -- Fix total TRES Billing on partitions.
>  -- Don't tear down a BB if a node fails and --no-kill or resize of a job
>     happens.
>  -- Remove unsafe usage of pthread_cancel in pmix plugin that can lead to
>     to deadlock in glibc.
>  -- Fix fatal in controller when loading completed trigger
>  -- Ignore reservation overlap at submission time.
>  -- GRES type model and QOS limits documentation added
>  -- slurmd - fix ABRT on SIGINT after reconfigure with MemSpecLimit set.
>  -- PMIx - move two error messages on retry to debug level, and only display
>     the error after the retry count has been exceeded.
>  -- Increase number of tries when sending responses to srun.
>  -- Fix checkpointing requeued/completing jobs in a bad state which caused a
>     segfault on restart.
>  -- Fix srun on ppc64 platforms.
>  -- Prevent slurmd from starting steps if the Prolog returns an error when using
>     PrologFlags=alloc.
>  -- priority/multifactor - prevent segfault running sprio if a partition has
>     just been deleted and PriorityFlags=CALCULATE_RUNNING is turned on.
>  -- job_submit/lua - add ESLURM_INVALID_TIME_LIMIT return code value.
>  -- job_submit/lua - print an error if the script calls log.user in
>     job_modify() instead of returning it to the next submitted job erroneously.
>  -- select/linear - handle job resize correctly.
>  -- select/cons_res - improve handling of --cores-per-socket requests

More information about the slurm-users mailing list