[slurm-users] Slurm version 17.11.6 available
Tim Wickberg
tim at schedmd.com
Wed May 9 14:24:11 MDT 2018
We are pleased to announce the availability of Slurm version 17.11.6.
This includes over 50 fixes made since 17.11.5 was released eight weeks
ago, including a race condition within the slurmstepd that can lead to
hung extern steps.
Slurm can be downloaded from https://www.schedmd.com/downloads.php
- Tim
> * Changes in Slurm 17.11.6
> ==========================
> -- CRAY - Add slurmsmwd to the contribs/cray dir.
> -- sview - fix crash when closing any search dialog.
> -- Fix initialization of variable in stepd when using native x11.
> -- Fix reading slurm_io_init_msg to handle partial messages.
> -- Fix scontrol create res segfault when wrong user/account parameters given.
> -- Fix documentation for sacct on parameter -X (--allocations)
> -- Change TRES Weights debug messages to debug3.
> -- FreeBSD - assorted fixes to restore build.
> -- Fix for not tracking environment variables from unrelated different jobs.
> -- PMIX - Added the direct connect authentication.
> When upgrading this may cause issues with jobs using pmix starting on mixed
> slurmstepd versions where some are less than 17.11.6.
> -- Prevent the backup slurmctld from losing the active/available node
> features list on takeover.
> -- Add documentation for fix IDLE*+POWER due to capmc stuck in Cray systems.
> -- Fix missing mutex unlock when prolog is failing on a node, leading to a
> hung slurmd.
> -- Fix locking around Cray CCM prolog/epilog.
> -- Add missing fed_mgr read locks.
> -- Fix issue incorrectly setting a job time_start to 0 while requeueing.
> -- smail - remove stray '-s' from mail subject line.
> -- srun - prevent segfault if ClusterName setting is unset but
> SLURM_WORKING_CLUSTER environment variable is defined.
> -- In configurator.html web pages change default configuration from
> task/none to task/affinity plugin and from select/linear plugin to
> select/cons_res plus CR_Core.
> -- Allow jobs to run beyond a FLEX reservation end time.
> -- Fix problem with wrongly set as Reservation job state_reason.
> -- Prevent bit_ffs() from returnig value out of bitmap range.
> -- Improve performance of 'squeue -u' when PrivateData=jobs is enabled.
> -- Make UnavailableNodes value in job reason be correct for each job.
> -- Fix 'squeue -o %s' on Cray systems.
> -- Fix incorrect error thrown when cancelling part of a job array.
> -- Fix error code and scheduling problem for --exclusive=[user|mcs].
> -- Fix build when lz4 is in a non-standard location.
> -- Be able to force power_down of cloud node even if in power_save state.
> -- Allow cloud nodes to be recognized in Slurm when booted out of band.
> -- Fixes race condition in _pack_job_gres() when is called multiple times.
> -- Increase duration of "sleep" command used to keep extern step alive.
> -- Remove unsafe usage of pthread_cancel in slurmstepd that can lead to
> to deadlock in glibc.
> -- Fix total TRES Billing on partitions.
> -- Don't tear down a BB if a node fails and --no-kill or resize of a job
> happens.
> -- Remove unsafe usage of pthread_cancel in pmix plugin that can lead to
> to deadlock in glibc.
> -- Fix fatal in controller when loading completed trigger
> -- Ignore reservation overlap at submission time.
> -- GRES type model and QOS limits documentation added
> -- slurmd - fix ABRT on SIGINT after reconfigure with MemSpecLimit set.
> -- PMIx - move two error messages on retry to debug level, and only display
> the error after the retry count has been exceeded.
> -- Increase number of tries when sending responses to srun.
> -- Fix checkpointing requeued/completing jobs in a bad state which caused a
> segfault on restart.
> -- Fix srun on ppc64 platforms.
> -- Prevent slurmd from starting steps if the Prolog returns an error when using
> PrologFlags=alloc.
> -- priority/multifactor - prevent segfault running sprio if a partition has
> just been deleted and PriorityFlags=CALCULATE_RUNNING is turned on.
> -- job_submit/lua - add ESLURM_INVALID_TIME_LIMIT return code value.
> -- job_submit/lua - print an error if the script calls log.user in
> job_modify() instead of returning it to the next submitted job erroneously.
> -- select/linear - handle job resize correctly.
> -- select/cons_res - improve handling of --cores-per-socket requests
More information about the slurm-users
mailing list