[slurm-announce] Slurm versions 17.02.10 and 17.11.5 are now available (CVE-2018-7033)

Tim Wickberg tim at schedmd.com
Thu Mar 15 13:12:20 MDT 2018

Slurm versions 17.02.10 and 17.11.5 are now available, and include a 
series of recent bug fixes, as well as a fix for a recently discovered 
security vulnerability (CVE-2018-7033).

Downloads are available at https://www.schedmd.com/downloads.php .

Several issues were discovered with incomplete sanitization of 
user-provided text strings, which could potentially lead to SQL 
injection attacks against SlurmDBD itself. Such exploits could lead to a 
loss of accounting data, or escalation of user privileges on the cluster.

We believe that variations on these vulnerabilities exist in all past 
SlurmDBD implementations back to Slurm 1.3 when the SlurmDBD was 
introduced, continuing through the current supported stable releases 
(17.02 and 17.11).

SchedMD customers were informed on March 1st and provided a patch on 
request. This is in keeping with our responsible disclosure process [1].

The only safe mitigation, aside from installing these updated versions, 
is to disable slurmdbd on your system.

One additional note: some sites have reported issues when upgrading to 
the Slurm 17.11 release series while using MySQL version 5.1 (which was 
the default in RHEL 6) and older. SchedMD customers are encouraged to 
contact support before upgrading such systems, and/or to upgrade their 
MySQL installation ahead of a SlurmDBD upgrade to 17.11.

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security.php

Tim Wickberg
Director of Support, SchedMD, LLC
Commercial Slurm Development and Support

> * Changes in Slurm 17.11.5
> ==========================
>  -- Fix cloud nodes getting stuck in DOWN+POWER_UP+NO_RESPOND state after not
>     responding by ResumeTimeout.
>  -- Add job's array_task_cnt and user_name along with partitions
>     [max|def]_mem_per_[cpu|node], max_cpus_per_node, and max_share with the
>     SHARED_FORCE definition to the job_submit/lua plugin.
>  -- srun - fix for SLURM_JOB_NUM_NODES env variable assignment.
>  -- sacctmgr - fix runaway jobs identification.
>  -- Fix for setting always the correct status on job update in mysql.
>  -- Fix issue if running with an association manager cache (slurmdbd was down
>     when slurmctld was started) you could loose QOS usage information.
>  -- CRAY - Fix spec file to work correctly.
>  -- Set scontrol exit code to 1 if attempting to update a node state to DRAIN
>     or DOWN without specifying a reason.
>  -- Fix race condition when running with an association manager cache
>     (slurmdbd was down when slurmctld was started).
>  -- Print out missing SLURM_PERSIST_INIT slurmdbd message type.
>  -- Fix two build errors related to use of the O_CLOEXEC flag with older glibc.
>  -- Add Google Cloud Platform integration scripts into contribs directory.
>  -- Fix minor potential memory leak in backfill plugin.
>  -- Add missing node flags (maint/power/etc) to node states.
>  -- Fix issue where job time limits may end up at 1 minute when using the
>     NoReserve flag on their QOS.
>  -- Fix security issue in accounting_storage/mysql plugin by always escaping
>     strings within the slurmdbd. CVE-2018-7033.

> * Changes in Slurm 17.02.10
> ==========================
>  -- Fix updating of requested TRES memory.
>  -- Cray modulefile: avoid removing /usr/bin from path on module unload.
>  -- Fix issue when resetting the partition pointers on nodes.
>  -- Show reason field in 'sinfo -R' when nodes is marked as failed.
>  -- Fix potential of slurmstepd segfaulting when the extern step fails to start.
>  -- Allow nodes state to be updated between FAIL and DRAIN.
>  -- Avoid registering a job'd credential multiple times.
>  -- Fix sbatch --wait to stop waiting after job is gone from memory.
>  -- Fix memory leak of MailDomain configuration string when slurmctld daemon is
>     reconfigured.
>  -- Fix to properly remove extern steps from the starting_steps list.
>  -- Fix Slurm to work correctly with HDF5 1.10+.
>  -- Add support in salloc/srun --bb option for "access_mode" in addition to
>     "access" for consistency with DW options.
>  -- Fix potential deadlock in _run_prog() in power save code.
>  -- MYSQL - Add dynamic_offset in the database to force range for auto
>     increment ids for the tres_table.
>  -- Avoid setting node in COMPLETING state indefinitely if the job initiating
>     the node reboot is cancelled while the reboot in in progress.
>  -- node_feature/knl_cray - Fix memory leaks that occur when slurmctld
>     reconfigured.
>  -- node_feature/knl_cray - Fix memory leak that can occur during normal
>     operation.
>  -- Fix job array dependency with "aftercorr" option and some task arrays in
>     the first job fail. This fix lets all task array elements that can run
>     proceed rather than stopping all subsequent task array elements.
>  -- Fix whole node allocation cpu counts when --hint=nomultihtread.
>  -- NRT - Fix issue when running on a HFI (p775) system with multiple protocols.
>  -- Fix uninitialized variables when unpacking slurmdb_archive_cond_t.
>  -- Fix security issue in accounting_storage/mysql plugin by always escaping
>     strings within the slurmdbd. CVE-2018-7033.

More information about the slurm-announce mailing list