[slurm-announce] Slurm version 17.11.1 available
tim at schedmd.com
Wed Dec 20 15:04:00 MST 2017
We are pleased to announce the availability of Slurm version 17.11.1.
This includes roughly 40 fixes made since 17.11.0 was released three
weeks ago, including one critical fix for any systems running on
Slurm can be downloaded from https://www.schedmd.com/downloads.php
> * Changes in Slurm 17.11.1
> -- Fix --with-shared-libslurm option to work correctly.
> -- Make it so only daemons log errors on configuration option duplicates.
> -- Fix for ConstrainDevices=yes to work correctly.
> -- Fix to purge old jobs using burst buffer if slurmctld daemon restarted
> after the job's burst buffer work was already completed.
> -- Make logging prefix for slurmstepd to happen as soon as possible.
> -- mpi/pmix: Fix the job registration for the PMIx v2.1.
> -- Fix uid check for signaling a step with anything but SIGKILL.
> -- Fix uid check when requesting a jobid from a pid.
> -- Return ESLURM_TRANSITION_STATE_NO_UPDATE instead of EAGAIN when trying to
> signal a step that is still running a prolog.
> -- Update Cray slurm_playbook.yaml with latest recommended version.
> -- Only say a prolog is done running after the extern step is launched.
> -- Wait to start a batch step until the prolog and extern step are
> fully ran/launched. Only matters if running with
> -- Truncate a range for SlurmctldPort to FD_SETSIZE elements and throw an
> error, otherwise network traffic may be lost due to poll() not detecting
> -- Fix for srun --pack-group option that can reuse/corrupt memory.
> -- Fix handling ultra long hostlists in a hostfile.
> -- X11: fix xauth regex to handle '-' in hostnames again.
> -- Fix potential node reboot timeout problem for "scontrol reboot" command.
> -- Add ability for squeue to sort jobs by submit time.
> -- CRAY - Switch to standard pid files on Cray systems.
> -- Update jobcomp records on duplicate inserts.
> -- If unrecognized configuration file option found then print an appropriate
> fatal error message rather than relying upon random errno value.
> -- Initialize job_desc_msg_t's instead of just memset'ing them.
> -- Fix divide by zero when job requests no tasks and more memory than
> -- Avoid changing Slurm internal errno on syslog() failures.
> -- BB - Only launch dependent jobs after the burst buffer is staged-out
> completely instead of right after the parent job finishes.
> -- node_features/knl_generic - If plugin can not fully load then do not spawn
> a background pthread (which will fail with invalid memory reference).
> -- Don't set the next jobid to give out to the highest jobid in the system on
> controller startup. Just use the checkpointed next use jobid.
> -- Docs - add Slurm/PMIx and OpenMPI build notes to the mpi_guide page.
> -- Add lustre_no_flush option to LaunchParameters for Native Cray systems.
> -- Fix rpmbuild issue with rpm 4.13+ / Fedora 25+.
> -- sacct - fix the display for the NNodes field when using the --units option.
> -- Prevent possible double-xfree on a buffer in stepd_completion.
> -- Fix for record job state on successful allocation but failed reply message.
> -- Fill in the user_name field for batch jobs if not sent by the slurmctld.
> (Which is the default behavior if PrologFlags=send_gids is not enabled.)
> This prevents job launch problems for sites using UsePAM=1.
> -- Handle syncing federated jobs that ran on non-origin clusters and were
> cancelled while the origin cluster was down.
> -- Fix accessing variable outside of lock.
> -- slurm.spec: move libpmi to a separate package to solve a conflict with the
> version provided by PMIx. This will require a separate change to PMIx as
> -- X11 forwarding: change xauth handling to use hostname/unix:display format,
> rather than localhost:display.
> -- mpi/pmix - Fix warning if not compiling with debug.
More information about the slurm-announce