From tim at schedmd.com Tue Jan 19 22:14:22 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 19 Jan 2021 15:14:22 -0700 Subject: [slurm-announce] Slurm version 20.11.3 is now available; reverts to older step launch semantics Message-ID: <0c9ef234-5a2c-87b5-8298-3282e4a69466@schedmd.com> We are pleased to announce the availability of Slurm version 20.11.3. This does include a major functional change to how job step launch is handled compared to the previous 20.11 releases. This affects srun as well as MPI stacks - such as Open MPI - which may use srun internally as part of the process launch. One of the changes made in the Slurm 20.11 release was to the semantics for job steps launched through the 'srun' command. This also inadvertently impacts many MPI releases that use srun underneath their own mpiexec/mpirun command. For 20.11.{0,1,2} releases, the default behavior for srun was changed such that each step was allocated exactly what was requested by the options given to srun, and did not have access to all resources assigned to the job on the node by default. This change was equivalent to Slurm setting the --exclusive option by default on all job steps. Job steps desiring all resources on the node needed to explicitly request them through the new '--whole' option. In the 20.11.3 release, we have reverted to the 20.02 and older behavior of assigning all resources on a node to the job step by default. This reversion is a major behavioral change which we would not generally do on a maintenance release, but is being done in the interest of restoring compatibility with the large number of existing Open MPI (and other MPI flavors) and job scripts that exist in production, and to remove what has proven to be a significant hurdle in moving to the new release. Please note that one change to step launch remains - by default, in 20.11 steps are no longer permitted to overlap on the resources they have been assigned. If that behavior is desired, all steps must explicitly opt-in through the newly added '--overlap' option. Further details and a full explanation of the issue can be found at: https://bugs.schedmd.com/show_bug.cgi?id=10383#c63 Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 20.11.3 > ========================== > -- Fix segfault when parsing bad "#SBATCH hetjob" directive. > -- Allow countless gpu: node GRES specifications in slurm.conf. > -- PMIx - Don't set UCX_MEM_MMAP_RELOC for older version of UCX (pre 1.5). > -- Don't green-light any GPU validation when core conversion fails. > -- Allow updates to a reservation in the database that starts in the future. > -- Better check/handling of primary key collision in reservation table. > -- Improve reported error and logging in _build_node_list(). > -- Fix uninitialized variable in _rpc_file_bcast() which could lead to an > incorrect error return from sbcast / srun --bcast. > -- mpi/cray_shasta - fix use-after-free on error in _multi_prog_parse(). > -- Cray - Handle setting correct prefix for cpuset cgroup with respects to > expected_usage_in_bytes. This fixes Cray's OOM killer. > -- mpi/pmix: Fix PMIx_Abort support. > -- Don't reject jobs allocating more cores than tasks with MaxMemPerCPU. > -- Fix false error message complaining about oversubscribe in cons_tres. > -- scrontab - fix parsing of empty lines. > -- Fix regression causing spank_process_option errors to be ignored. > -- Avoid making multiple interactive steps. > -- Fix corner case issues where step creation should fail. > -- Fix job rejection when --gres is less than --gpus. > -- Fix regression causing spank prolog/epilog not to be called unless the > spank plugin was loaded in slurmd context. > -- Fix regression preventing SLURM_HINT=nomultithread from being used > to set defaults for salloc->srun, sbatch->srun sequence. > -- Reject job credential if non-superuser sets the LAUNCH_NO_ALLOC flag. > -- Make it so srun --no-allocate works again. > -- jobacct_gather/linux - Don't count memory on tasks that have already > finished. > -- Fix 19.05/20.02 batch steps talking with a 20.11 slurmctld. > -- jobacct_gather/common - Do not process jobacct's with same taskid when > calling prec_extra. > -- Cleanup all tracked jobacct tasks when extern step child process finishes. > -- slurmrestd/dbv0.0.36 - Correct structure of dbv0.0.36_tres_list. > -- Fix regression causing task/affinity and task/cgroup to be out of sync when > configured ThreadsPerCore is different than the physical threads per core. > -- Fix situation when --gpus is given but not max nodes (-N1-1) in a job > allocation. > -- Interactive step - ignore cpu bind and mem bind options, and do not set > the associated environment variables which lead to unexpected behavior > from srun commands launched within the interactive step. > -- Handle exit code from pipe when using UCX with PMIx. From tim at schedmd.com Thu Feb 18 22:38:04 2021 From: tim at schedmd.com (Tim Wickberg) Date: Thu, 18 Feb 2021 15:38:04 -0700 Subject: [slurm-announce] Slurm versions 20.11.4 is now available Message-ID: <6beb719d-2526-e88f-c211-86de33db60f0@schedmd.com> We are pleased to announce the availability of Slurm version 20.11.4. This includes a workaround for a broken glibc version that erroneously prints a long-double value of 0 as "nan", which can corrupt Slurm's association state files. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 20.11.4 > ========================== > -- Fix node selection for advanced reservations with features. > -- mpi/pmix: Handle pipe failure better when using ucx. > -- mpi/pmix: include PMIX_NODEID for each process entry. > -- Fix job getting rejected after being requeued on same node that died. > -- job_submit/lua - add "network" field. > -- Fix situations when a reoccuring reservation could erroneously skip a > period. > -- Ensure that a reservations [pro|epi]log are ran on reoccuring reservations. > -- Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY. > -- Fix scheduling issue with --gpus. > -- Fix gpu allocations that request --cpus-per-task. > -- mpi/pmix: fixed print messages for all PMIXP_* macros > -- Add mapping for XCPU to --signal option. > -- Fix regression in 20.11 that prevented a full pass of the main scheduler > from ever executing. > -- Work around a glibc bug in which "0" is incorrectly printed as "nan" > which will result in corrupted association state on restart. > -- Fix regression in 20.11 which made slurmd incorrectly attempt to find the > parent slurmd address when not applicable and send incorrect reverse-tree > info to the slurmstepd. > -- Fix cgroup ns detection when using containers (e.g. LXC or Docker). > -- scrontab - change temporary file handling to work with emacs. From tim at schedmd.com Tue Mar 16 22:16:56 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 16 Mar 2021 16:16:56 -0600 Subject: [slurm-announce] Slurm version 20.11.5 is now available Message-ID: <64175c45-26c3-aa6d-4285-dec1fd6e04ca@schedmd.com> We are pleased to announce the availability of Slurm version 20.11.5. This includes a number of moderate severity bug fixes, alongside a new job_container/tmpfs plugin developed by NERSC that can be used to create per-job filesystem namespaces. Initial documentation for this plugin is available at: https://slurm.schedmd.com/job_containe.conf.html Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 20.11.5 > ========================== > -- Fix main scheduler bug where bf_hetjob_prio truncates SchedulerParameters. > -- Fix sacct not displaying UserCPU, SystemCPU and TotalCPU for large times. > -- scrontab - fix to return the correct index for a bad #SCRON option. > -- scrontab - fix memory leak when invalid option found in #SCRON line. > -- Add errno for when a user requests multiple partitions and they are using > partition based associations. > -- Fix issue where a job could run in a wrong partition when using > EnforcePartLimits=any and partition based associations. > -- Remove possible deadlock when adding associations/wckeys in multiple > threads. > -- When using PrologFlags=alloc make sure the correct Slurm version is set > in the credential. > -- When sending a job a warning signal make sure we always send SIGCONT > beforehand. > -- Fix issue where a batch job would continue running if a prolog failed on a > node that wasn't the batch host and requeuing was disabled. > -- Fix issue where sometimes salloc/srun wouldn't get a message about a prolog > failure in the job's stdout. > -- Requeue or kill job on a prolog failure when PrologFlags is not set. > -- Fix race condition causing node reboots to get requeued before > ResumeTimeout expires. > -- Preserve node boot_req_time on reconfigure. > -- Preserve node power_save_req_time on reconfigure. > -- Fix node reboots being queued and issued multiple times and preventing the > reboot to time out. > -- Fix debug message related to GrpTRESRunMin (AssocGrpCPURunMinutesLimit). > -- Fix run_command to exit correctly if track_script kills the calling thread. > -- Only requeue a job when the PrologSlurmctld returns nonzero. > -- When a job is signaled with SIGKILL make sure we flush all > prologs/setup scripts. > -- Handle burst buffer scripts if the job is canceled while stage_in is > happening. > -- When shutting down the slurmctld make note to ignore error message when > we have to kill a prolog/setup script we are tracking. > -- scrontab - add support for the --open-mode option. > -- acct_gather_profile/influxdb - avoid segfault on plugin shutdown if setup > has not completed successfully. > -- Reduce delay in starting salloc allocations when running with prologs. > -- Fix issue passing open fd's with [send|recv]msg. > -- Alter AllocNodes check to work if the allocating node's domain doesn't > match the slurmctld's. This restores the pre-20.11 behavior. > -- Fix slurmctld segfault if jobs from a prior version had the now-removed > INVALID_DEPEND state flag set and were allowed to run in 20.11. > -- Add job_container/tmpfs plugin to give a method to provide a private /tmp > per job. > -- Set the correct core affinity when using AutoDetect. > -- Start relying on the conf again in xcpuinfo_mac_to_abs(). > -- Fix global_last_rollup assignment on job resizing. > -- slurmrestd - hand over connection context on _on_message_complete(). > -- slurmrestd - mark "environment" as required for job submissions in schema. > -- slurmrestd - Disable credential reuse on the same TCP connection. Pipelined > HTTP connections will have to provide authentication with every request. > -- Avoid data conversion error on NULL strings in data_get_string_converted(). > -- Handle situation where slurmctld is too slow processing > REQUEST_COMPLETE_BATCH_SCRIPT and it gets resent from the slurmstepd. From tim at schedmd.com Tue Mar 16 22:26:14 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 16 Mar 2021 16:26:14 -0600 Subject: [slurm-announce] Slurm version 20.11.5 is now available In-Reply-To: <64175c45-26c3-aa6d-4285-dec1fd6e04ca@schedmd.com> References: <64175c45-26c3-aa6d-4285-dec1fd6e04ca@schedmd.com> Message-ID: <75a35719-d4e0-deae-edb2-96f6180287ae@schedmd.com> One errant backspace snuck into that announcement: the job_container.conf man page (with an 'r') serves as the initial documentation for this new job_container/tmpfs plugin. The link to the HTML version of the man page has been corrected in the text below: On 3/16/21 4:16 PM, Tim Wickberg wrote: > We are pleased to announce the availability of Slurm version 20.11.5. > > This includes a number of moderate severity bug fixes, alongside a new > job_container/tmpfs plugin developed by NERSC that can be used to create > per-job filesystem namespaces. > > Initial documentation for this plugin is available at: > https://slurm.schedmd.com/job_container.conf.html > Slurm can be downloaded from https://www.schedmd.com/downloads.php . > > - Tim > From tim at schedmd.com Tue Apr 27 20:36:38 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 27 Apr 2021 14:36:38 -0600 Subject: [slurm-announce] Slurm version 20.11.6 is now available Message-ID: <41b9ca15-89e3-675f-d8e6-95f2d72e0bb0@schedmd.com> We are pleased to announce the availability of Slurm version 20.11.6. This includes a number of minor-to-moderate severity fixes, as well as improvements to the recently added job_container/tmpfs plugin. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 20.11.6 > ========================== > -- Fix sacct assert with the --qos option. > -- Use pkg-config --atleast-version instead of --modversion for systemd. > -- common/fd - fix getsockopt() call in fd_get_socket_error(). > -- Properly handle the return from fd_get_socket_error() in _conn_readable(). > -- cons_res - Fix issue where running jobs were not taken into consideration > when creating a reservation. > -- Avoid a deadlock between job_list for_each and assoc QOS_LOCK. > -- Fix TRESRunMins usage for partition qos on restart/reconfig. > -- Fix printing of number of tasks on a completed job that didn't request > tasks. > -- Fix updating GrpTRESRunMins when decrementing job time is bigger than it. > -- Make it so we handle multithreaded allocations correctly when doing > --exclusive or --core-spec allocations. > -- Fix incorrect round-up division in _pick_step_cores > -- Use appropriate math to adjust cpu counts when --ntasks-per-core=1. > -- cons_tres - Fix consideration of power downed nodes. > -- cons_tres - Fix DefCpuPerGPU, increase cpus-per-task to match with > gpus-per-task * cpus-per-gpu. > -- Fix under-cpu memory auto-adjustment when MaxMemPerCPU is set. > -- Make it possible to override CR_CORE_DEFAULT_DIST_BLOCK. > -- Perl API - fix retrieving/storing of slurm_step_id_t in job_step_info_t. > -- Recover state of burst buffers when slurmctld is restarted to avoid skipping > burst buffer stages. > -- Fix race condition in burst buffer plugin which caused a burst buffer > in stage-in to not get state saved if slurmctld stopped. > -- auth/jwt - print an error if jwt_file= has not been set in slurmdbd. > -- Fix RESV_DEL_HOLD not being a valid state when using squeue --states. > -- Add missing squeue selectable states in valid states error message. > -- Fix scheduling last array task multiple times on error, causing segfault. > -- Fix issue where a step could be allocated more memory than the job when > dealing with --mem-per-cpu and --threads-per-core. > -- Fix removing qos from assoc with -= can lead to assoc with no qos > -- auth/jwt - fix segfault on invalid credential in slurmdbd due to > missing validate_slurm_user() function in context. > -- Fix single Port= not being applied to range of nodes in slurm.conf > -- Fix Jobs not requesting a tres are not starting because of that tres limit. > -- acct_gather_energy/rapl - fix AveWatts calculation. > -- job_container/tmpfs - Fix issues with cleanup and slurmd restarting on > running jobs. From tim at schedmd.com Wed May 12 20:42:30 2021 From: tim at schedmd.com (Tim Wickberg) Date: Wed, 12 May 2021 14:42:30 -0600 Subject: [slurm-announce] Slurm versions 20.11.7 and 20.02.7 are now available (CVE-2021-31215) Message-ID: Slurm versions 20.11.7 and 20.02.7 are now available, and include a series of recent bug fixes, as well as a critical security fix. SchedMD customers were informed of this issue on April 28th and provided a fix on request; this process is documented in our security policy. [1] CVE-2021-31215: An issue was identified with environment handling within Slurm that can allow any user to run arbitrary commands as SlurmUser if the installation uses a PrologSlurmctld and/or EpilogSlurmctld script. Downloads are available at https://www.schedmd.com/downloads.php . Release notes follow below. - Tim [1] https://www.schedmd.com/security.php -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 20.11.7 > ========================== > -- slurmd - handle configless failures gracefully instead of hanging > indefinitely. > -- select/cons_tres - fix Dragonfly topology not selecting nodes in the same > leaf switch when it should as well as requests with --switches option. > -- Fix issue where certain step requests wouldn't run if the first node in the > job allocation was full and there were idle resources on other nodes in > the job allocation. > -- Fix deadlock issue with Slurmctld. > -- torque/qstat - fix printf error message in output. > -- When adding associations or wckeys avoid checking multiple times a user or > cluster name. > -- Fix wrong jobacctgather information on a step on multiple nodes > due to timeouts sending its the information gathered on its node. > -- Fix missing xstrdup which could result in slurmctld segfault on array jobs. > -- Fix security issue in PrologSlurmctld and EpilogSlurmctld by always > prepending SPANK_ to all user-set environment variables. CVE-2021-31215. > * Changes in Slurm 20.02.7 > ========================== > -- cons_tres - Fix DefCpuPerGPU > -- select/cray_aries - Correctly remove jobs/steps from blades using NPC. > -- Fix false positive oom-kill events on extern step termination when > jobacct_gather/cgroup configured. > -- Ensure SPANK prolog and epilog run without an explicit PlugStackConfig. > -- Fix missing xstrdup which could result in slurmctld segfault on array jobs. > -- Fix security issue in PrologSlurmctld and EpilogSlurmctld by always > prepending SPANK_ to all user-set environment variables. CVE-2021-31215. From tim at schedmd.com Thu Jul 1 23:00:17 2021 From: tim at schedmd.com (Tim Wickberg) Date: Thu, 1 Jul 2021 17:00:17 -0600 Subject: [slurm-announce] Slurm version 20.11.8 is now available Message-ID: <65d068d0-bd67-65dd-4a6d-30bde6c90376@schedmd.com> We are pleased to announce the availability of Slurm version 20.11.8. This includes a number of minor-to-moderate severity bug fixes. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 20.11.8 > ========================== > -- slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs. > -- Correct the error given when auth plugin fails to pack a credential. > -- Fix unused-variable compiler warning on FreeBSD in fd_resolve_path(). > -- acct_gather_filesystem/lustre - only emit collection error once per step. > -- srun - leave SLURM_DIST_UNKNOWN as default for --interactive. > -- Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the > interactive step, the same as is done for the batch step. > -- Fix various potential deadlocks when altering objects in the database > dealing with every cluster in the database. > -- slurmrestd - handle slurmdbd connection failures without segfaulting. > -- slurmrestd - fix segfault for searches in slurmdb/v0.0.36/jobs. > -- slurmrestd - remove (non-functioning) users query parameter for > slurmdb/v0.0.36/jobs from openapi.json > -- slurmrestd - fix segfault in slurmrestd db/jobs with numeric queries > -- slurmrestd - add argv handling for job/submit endpoint. > -- srun - fix broken node step allocation in a heterogeneous allocation. > -- Fail step creation if -n is not multiple of --ntasks-per-gpu. > -- job_container/tmpfs - Fix slowdown on teardown. > -- Fix problem with SlurmctldProlog where requeued jobs would never launch. > -- job_container/tmpfs - Fix issue when restarting slurmd where the namespace > mount points could disappear. > -- sacct - avoid truncating JobId at 34 characters. > -- scancel - fix segfault when --wckey filtering option is used. > -- select/cons_tres - Fix memory leak. > -- Prevent file descriptor leak in job_container/tmpfs on slurmd restart. > -- slurmrestd/dbv0.0.36 - Fix values dumped in job state/current and > job step state. > -- slurmrestd/dbv0.0.36 - Correct description for previous state property. > -- perlapi/libslurmdb - expose tres_req_str to job hash. > -- scrontab - close and reopen temporary crontab file to deal with editors > that do not change the original file, but instead write out then rename > a new file. > -- sstat - fix linking so that it will work when --without-shared-libslurm > was used to build Slurm. > -- Clear allocated cpus for running steps in a job before handling requested > nodes on new step. > -- Don't reject a step if not enough nodes are available. Instead, defer the > step until enough nodes are available to satisfy the request. > -- Don't reject a step if it requests at least one specific node that is > already allocated to another step. Instead, defer the step until the > requested node(s) become available. > -- slurmrestd - add description for slurmdb/job endpoint. > -- Better handling of --mem=0. > -- Ignore DefCpuPerGpu when --cpus-per-task given. > -- sacct - fix segfault when printing StepId (or when using --long). From tim at schedmd.com Thu Jul 29 21:15:23 2021 From: tim at schedmd.com (Tim Wickberg) Date: Thu, 29 Jul 2021 15:15:23 -0600 Subject: [slurm-announce] Slurm release candidate version 21.08.0rc1 available for testing Message-ID: We are pleased to announce the availability of Slurm release candidate version 21.08.0rc1. This is the first release candidate version of the upcoming 21.08 release series, and represents the end of development for the release cycle, and a finalization of the RPC and state file formats. If any issues are identified with this release candidate, please report them through https://bugs.schedmd.com against the 21.08.x version and we will address them before the first production 21.08.0 release is made. Please note that the release candidates are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 21.08.0 and are considered frozen at this time for the 21.08 release. A preview of the updated documentation can be found at https://slurm.schedmd.com/archive/slurm-master/ . Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support From tim at schedmd.com Thu Aug 12 21:24:25 2021 From: tim at schedmd.com (Tim Wickberg) Date: Thu, 12 Aug 2021 15:24:25 -0600 Subject: [slurm-announce] Slurm release candidate version 21.08.0rc2 available for testing Message-ID: <9e248b73-7ee7-0fb9-f722-7488c3afeaf6@schedmd.com> We are pleased to announce the availability of Slurm release candidate version 21.08.0rc2. This is the second release candidate version of the upcoming 21.08 release series, and corrects a number of issues identified with rc1. If any issues are identified with this release candidate, please report them through https://bugs.schedmd.com against the 21.08.x version and we will address them before the first production 21.08.0 release is made. Please note that the release candidates are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 21.08.0 and are considered frozen at this time for the 21.08 release. A preview of the updated documentation can be found at https://slurm.schedmd.com/archive/slurm-master/ . Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support From tim at schedmd.com Thu Aug 26 20:40:45 2021 From: tim at schedmd.com (Tim Wickberg) Date: Thu, 26 Aug 2021 14:40:45 -0600 Subject: [slurm-announce] Slurm version 21.08 is now available Message-ID: <52c333f6-0ca5-8a3c-6ec7-7d4ee8e0160c@schedmd.com> After 9 months of development and testing we are pleased to announce the availability of Slurm version 21.08! Slurm 21.08 includes a number of new features including: - A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. - Added "sacct -o SubmitLine" format option to get the submit line of a job/step. - Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. - RS256 token support in auth/jwt. - Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. - Further improvements to cloud node power state management. - A new child process of the Slurm controller called 'slurmscriptd' responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. - A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. - Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. - Added json/yaml output to sacct, squeue, and sinfo commands. - Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. - Added support for automatically detecting and broadcasting shared libraries for an executable launched with 'srun --bcast'. - Added initial OCI container execution support with a new --container option to sbatch and srun. - Improved job step launch throughput. - Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. Please see the RELEASE_NOTES distributed alongside the source for further details. Thank you to all customers, partners, and community members who contributed to this release. As with past releases, the documentation available at https://slurm.schedmd.com has been updated to the 21.08 release. Past versions are available in the archive. This release also marks the end of support for the 20.02 release. The 20.11 release will remain supported up until the 22.05 release next May, but will not see as frequent updates, and bug-fixes will be targeted for the 21.08 maintenance releases going forward. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support From tim at schedmd.com Tue Aug 31 20:26:33 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 31 Aug 2021 14:26:33 -0600 Subject: [slurm-announce] Slurm User Group Meeting (SLUG'21) will be held on YouTube on September 21st Message-ID: <9097a264-0c8c-b33b-da52-55d17c10fc74@schedmd.com> The Slurm User Group Meeting (SLUG'21) this fall will be online once again. In lieu of an in-person meeting, SchedMD will broadcast a set of five presentations on Tuesday, September 21st, 2021, from 9am to noon (MDT) on our YouTube channel: https://www.youtube.com/c/schedmd-slurm There is no cost to attend, and there is no registration required. Topics include: "Field Notes" (best practices / tips + tricks), Containers and updates to the REST API, the new burst_buffer/lua plugin and slurmscript, Slurm on Cloud, and an overview of the 20.11 and 21.08 release as well as future roadmap. I'll also be sending a few reminders out as we get closer to the event. Hope to (virtually) see you there! - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support From tim at schedmd.com Wed Sep 15 20:50:04 2021 From: tim at schedmd.com (Tim Wickberg) Date: Wed, 15 Sep 2021 14:50:04 -0600 Subject: [slurm-announce] Slurm User Group Meeting (SLUG'21) will be held on YouTube on September 21st In-Reply-To: <9097a264-0c8c-b33b-da52-55d17c10fc74@schedmd.com> References: <9097a264-0c8c-b33b-da52-55d17c10fc74@schedmd.com> Message-ID: <197cb947-d009-9960-6b1b-36f058e2bb37@schedmd.com> One more reminder that the Slurm User Group Meeting (SLUG'21) will be held on Tuesday, streaming through YouTube Live. The agenda's been updated with the titles for each of the five sessions, and links have been added to the individual streams: https://slurm.schedmd.com/slurm_ug_agenda.html - Tim On 8/31/21 2:26 PM, Tim Wickberg wrote: > The Slurm User Group Meeting (SLUG'21) this fall will be online once > again. In lieu of an in-person meeting, SchedMD will broadcast a set of > five presentations on Tuesday, September 21st, 2021, from 9am to noon > (MDT) on our YouTube channel: > https://www.youtube.com/c/schedmd-slurm > > There is no cost to attend, and there is no registration required. > > Topics include: "Field Notes" (best practices / tips + tricks), > Containers and updates to the REST API, the new burst_buffer/lua plugin > and slurmscript, Slurm on Cloud, and an overview of the 20.11 and 21.08 > release as well as future roadmap. > > I'll also be sending a few reminders out as we get closer to the event. > > Hope to (virtually) see you there! > - Tim > From tim at schedmd.com Thu Sep 16 21:45:09 2021 From: tim at schedmd.com (Tim Wickberg) Date: Thu, 16 Sep 2021 15:45:09 -0600 Subject: [slurm-announce] Slurm version 21.08.1 is now available Message-ID: We are pleased to announce the availability of Slurm version 21.08.1. For sites using scrontab, there is a critical fix included to ensure that the cron jobs continue to repeat indefinitely into the future. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 21.08.1 > ========================== > -- Fix potential memory leak if a problem happens while allocating GRES for > a job. > -- If an overallocation of GRES happens terminate the creation of a job. > -- AutoDetect=nvml: Fatal if no devices found in MIG mode. > -- slurm.spec - fix querying for PMIx and UCX version. > -- Print federation and cluster sacctmgr error messages to stderr. > -- Fix off by one error in --gpu-bind=mask_gpu. > -- Fix statement condition in http_parser autoconf macro. > -- Fix statement condition in netloc autoconf macro. > -- Add --gpu-bind=none to disable gpu binding when using --gpus-per-task. > -- Handle the burst buffer state "alloc-revoke" which previously would not > display in the job correctly. > -- Fix issue in the slurmstepd SPANK prolog/epilog handler where configuration > values were used before being initialized. > -- Restore a step's ability to utilize all of an allocations memory if --mem=0. > -- Fix --cpu-bind=verbose garbage taskid. > -- Fix cgroup task affinity issues from garbage taskid info. > -- Make gres_job_state_validate() client logging behavior as before 44466a4641. > -- Fix steps with --hint overriding an allocation with --threads-per-core. > -- Require requesting a GPU if --mem-per-gpu is requested. > -- Return error early if a job is requesting --ntasks-per-gpu and no gpus or > task count. > -- Properly clear out pending step if unavailable to run with available > resources. > -- Kill all processes spawned by burst_buffer.lua including decendents. > -- openapi/v0.0.{35,36,37} - Avoid setting default values of min_cpus, > job name, cwd, mail_type, and contiguous on job update. > -- openapi/v0.0.{35,36,37} - Clear user hold on job update if hold=false. > -- Prevent CRON_JOB flag from being cleared when loading job state. > -- sacctmgr - Fix deleting WCKeys when not specifying a cluster. > -- Fix getting memory for a step when the first node in the step isn't the > first node in the allocation. > -- Make SelectTypeParameters=CR_Core_Memory default for cons_tres and cons_res. > -- Correctly handle mutex unlocks in the gres code if failures happen. > -- Give better error message if -m plane is given with no size. > -- Fix --distribution=arbitrary for salloc. > -- Fix jobcomp/script regression introduced in 21.08.0rc1 0c75b9ac9d. > -- Only send the batch node in the step_hostlist in the job credential. > -- When setting affinity for the batch step don't assume the batch host is node > 0. > -- In task/affinity better checking for node existence when laying out > affinity. > -- slurmrestd - fix job submission with auth/jwt. From tim at schedmd.com Mon Sep 20 23:30:30 2021 From: tim at schedmd.com (Tim Wickberg) Date: Mon, 20 Sep 2021 17:30:30 -0600 Subject: [slurm-announce] Slurm User Group Meeting (SLUG'21) will be held on YouTube on September 21st In-Reply-To: <197cb947-d009-9960-6b1b-36f058e2bb37@schedmd.com> References: <9097a264-0c8c-b33b-da52-55d17c10fc74@schedmd.com> <197cb947-d009-9960-6b1b-36f058e2bb37@schedmd.com> Message-ID: <14e86d27-0232-ce8b-6ea0-9ed856253728@schedmd.com> One last reminder: the Slurm User Group Meeting will be starting at 9am (Mountain) on Tuesday. Hope to (virtually) see you there! - Tim On 9/15/21 2:50 PM, Tim Wickberg wrote: > One more reminder that the Slurm User Group Meeting (SLUG'21) will be > held on Tuesday, streaming through YouTube Live. > > The agenda's been updated with the titles for each of the five sessions, > and links have been added to the individual streams: > > https://slurm.schedmd.com/slurm_ug_agenda.html > > - Tim > > On 8/31/21 2:26 PM, Tim Wickberg wrote: >> The Slurm User Group Meeting (SLUG'21) this fall will be online once >> again. In lieu of an in-person meeting, SchedMD will broadcast a set >> of five presentations on Tuesday, September 21st, 2021, from 9am to >> noon (MDT) on our YouTube channel: >> https://www.youtube.com/c/schedmd-slurm >> >> There is no cost to attend, and there is no registration required. >> >> Topics include: "Field Notes" (best practices / tips + tricks), >> Containers and updates to the REST API, the new burst_buffer/lua >> plugin and slurmscript, Slurm on Cloud, and an overview of the 20.11 >> and 21.08 release as well as future roadmap. >> >> I'll also be sending a few reminders out as we get closer to the event. From tim at schedmd.com Wed Sep 22 19:10:33 2021 From: tim at schedmd.com (Tim Wickberg) Date: Wed, 22 Sep 2021 13:10:33 -0600 Subject: [slurm-announce] Slides and video from the SLUG'21 presentations are online In-Reply-To: <197cb947-d009-9960-6b1b-36f058e2bb37@schedmd.com> References: <9097a264-0c8c-b33b-da52-55d17c10fc74@schedmd.com> <197cb947-d009-9960-6b1b-36f058e2bb37@schedmd.com> Message-ID: <62554a31-553f-3391-28c3-71bd5711736b@schedmd.com> The slides from SLUG'21 have now been uploaded to the Slurm Publication Archive: https://slurm.schedmd.com/publications.html The video recordings will remain[0] on our YouTube Channel for at least the next two weeks: https://www.youtube.com/c/schedmd-slurm As mentioned at the end of my presentation, we will have the Slurm Community booth at SC'21 in St. Louis. Although SchedMD will only have a skeleton crew on site. The Slurm Birds-of-a-Feather session was approved for SC'21. We've requested this be held fully virtual but are waiting for confirmation, and will send further details on that as we get closer to the event. And thank you to everyone who showed up during the live presentations yesterday. The presenters always appreciate the feedback and questions, even if it's not quite as interactive as our in-person meetings. cheers, - Tim [0] Apologies to anyone who went looking for them yesterday - the live streams apparently take 12 hours before being listed on the 'Uploads' section which is the default view for the channel. The videos were available, but you needed to know to look for them on the SLUG'21 playlist or use the direct links from the agenda to get to them. -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support From tim at schedmd.com Tue Oct 5 22:56:39 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 5 Oct 2021 16:56:39 -0600 Subject: [slurm-announce] Slurm version 21.08.2 is now available Message-ID: <820a983e-98aa-4f7c-5306-c849c06fab29@schedmd.com> We are pleased to announce the availability of Slurm version 21.08.2. There is one significant change include in this maintenance release: the removal of support for the long-misunderstood TaskAffinity=yes option in cgroup.conf. Please consider using "TaskPlugins=cgroup,affinity" in slurm.conf as an option. Unfortunately a number of issues identified where the processor affinity settings from this now-unsupported approach would be calculated incorrectly, leading to potential performance issues. SchedMD had been previously planning to remove this support in the next 22.05 release, but a number of issues reported after the cgroup code refactoring have led us to remove this now, rather than try to correct issues with what has not been a recommended configuration for some time. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 21.08.2 > ========================== > -- slurmctld - fix how the max number of cores on a node in a partition are > calculated when the partition contains multi-socket nodes. This in turn > corrects certain jobs node count estimations displayed client-side. > -- job_submit/cray_aries - fix "craynetwork" GRES specification after changes > introduced in 21.08.0rc1 that made TRES always have a type prefix. > -- Ignore nonsensical check in the slurmd for [Pro|Epi]logSlurmctld. > -- Fix writing to stderr/syslog when systemd runs slurmctld in the foreground. > -- Fix locking around log level setting routines. > -- Fix issue with updating job started with node range. > -- Fix issue with nodes not clearing state in the database when the slurmctld > is started with clean-start. > -- Fix hetjob components > 1 timing out due to InactiveLimit. > -- Fix sprio printing -nan for normalized association priority if > PriorityWeightAssoc was not defined. > -- Disallow FirstJobId=0. > -- Preserve job start info in the database for a requeued job that hadn't > registered the first time in the database yet. > -- Only send one message on prolog failure from the slurmd. > -- Remove support for TaskAffinity=yes in cgroup.conf. > -- accounting_storage/mysql - fix issue where querying jobs via sacct > --whole-hetjob=yes or slurmrestd (which automatically includes this flag) > could in some cases return more records than expected. > -- Fix issue for preemption of job array task that makes afterok dependency > fail. Additionally, send emails when requeueing happens due to preemption. > -- Fix sending requeue mail type. > -- Properly resize a job's GRES bitmaps and counts when resizing the job. > -- Fix node being able to transition to CLOUD state from non-cloud state. > -- Fix regression introduced in 21.08.0rc1 which broke a step's ability to > inherit GRES from the job when the step didn't request GRES but the job did. > -- Fix errors in logic when picking nodes based on bracketed anded constraints. > This also enforces the requirement to have a count when using such > constraints. > -- Handle job resize better in the database. > -- Exclude currently running, resized jobs from the runaway jobs list. > -- Make it possible to shrink a job more than once. From tim at schedmd.com Tue Nov 2 20:50:29 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 2 Nov 2021 13:50:29 -0700 Subject: [slurm-announce] Slurm version 21.08.3 is now available Message-ID: We are pleased to announce the availability of Slurm version 21.08.3. This includes a number of fixes since the last release a month ago, including one critical fix to prevent a communication issue between slurmctld and slurmdbd for sites that have started using the new AccountingStoreFlags=job_script functionality. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 21.08.3 > ========================== > -- Return error to sacctmgr when running 'sacctmgr archive load' and the load > fails due to an invalid or corrupted file. > -- slurmctld/gres_ctld - fix deallocation of typed GRES without device. > -- scrontab - fix capturing the cronspec request in the job script. > -- openapi/dbv0.0.37 - Add missing method POST for /associations/. > -- If ALTER TABLE was already run, continue with database upgrade. > -- slurmstepd - Gracefully handle RunTimeQuery returning no output. > -- srun - automatically handle issues with races to listen() on an ephemeral > socket, and suppress otherwise needless error messages. > -- Schedule sooner after Epilog completion with SchedulerParameters=defer. > -- Improve performance for AccountingStoreFlags=job_env. > -- Expose missing SLURMD_NODENAME and SLURM_NODEID to TaskEpilog environment. > -- Bring slurm_completion.sh up to date with changes to commands. > -- Fix issue where burst buffer stage-in could only start for one job in a job > array per scheduling cycle instead of bb_array_stage_cnt jobs per scheduling > cycle. > -- Fix checking if the dependency is the same job for array jobs. > -- Fix checking for circular dependencies with job arrays. > -- Restore dependent job pointers on slurmctld startup to avoid race. > -- openapi/v0.0.37 - Allow strings for JobIds instead of only numerical JobIds > for GET, DELETE, and POST job methods. > -- openapi/dbv0.0.36 - Gracefully handle missing associations. > -- openapi/dbv0.0.36 - Avoid restricting job association lookups to only > default associations. > -- openapi/dbv0.0.37 - Gracefully handle missing associations. > -- openapi/dbv0.0.37 - Avoid restricting job association lookups to only > default associations. > -- Fix error in GPU frequency validation logic. > -- Fix regression in 21.08.1 that broke federated jobs. > -- Correctly handle requested GRES when used in job arrays. > -- Fix error in pmix logic dealing with the incorrect size of buffer. > -- Fix handling of no_consume GRES, add it to allocated job allocated TRES. > -- Fix issue with typed GRES without Files= (bitmap). > -- Fix job_submit/lua support for 'gres' which is now stored as a 'tres' > when requesting jobs so needs a 'gres' prefix. > -- Fix regression where MPS would not deallocate from the node properly. > -- Fix --gpu-bind=verbose to work correctly. > -- Do not deny --constraint with special operators "[]()|*" when no changeable > features are requested, but continue to deny --constraint with special > operators when changeable features are requested. > -- openapi/v0.0.{35,36,37} - prevent merging the slurmrestd environment > alongside a new job submission. > -- openapi/dbv0.0.36 - Correct tree position of dbv0.0.36_job_step. > -- openapi/dbv0.0.37 - Correct tree position of dbv0.0.37_job_step. > -- openapi/v0.0.37 - enable job priority field for job submissions and updates. > -- openapi/v0.0.37 - request node states query includes MIXED state instead of > only allocated. > -- mpi/pmix - avoid job hanging until the time limit on PMIx agent failures. > -- Correct inverted logic where reduced version matching applied to non-SPANK > plugins where it should have only applied to SPANK plugins. > -- Fix issues where prologs would run in serial without PrologFlags=serial. > -- Make sure a job coming in is initially considered for magnetic reservations. > -- PMIx v1.1.4 and below are no longer supported. > -- Add comment to service files about disabling logging through journald. > -- Add SLURM_NODE_ALIASES env to RPC Prolog (PrologFlags=alloc) environment. > -- Limit max_script_size to 512 MB. > -- Fix shutdown of slurmdbd plugin to correctly notice when the agent thread > finishes. > -- slurmdbd - fix issue with larger batch script files being sent to SlurmDBD > with AccountingStoreFlags=job_script that can lead to accounting data loss > as the resulting RPC generated can exceed internal limits and won't be > sent, preventing further communication with SlurmDBD. > This issue is indicated by "error: Invalid msg_size" in your log files. > -- Fix compile issue with --without-shared-libslurm. From tim at schedmd.com Fri Nov 12 22:00:26 2021 From: tim at schedmd.com (Tim Wickberg) Date: Fri, 12 Nov 2021 15:00:26 -0700 Subject: [slurm-announce] Slurm BoF and booth at SC21 Message-ID: The Slurm Birds-of-a-Feather session will be held virtually on Thursday, November at 12:15 - 1:15pm (Central). This is conducted through the SC21 HUBB platform, and you will need to have registered in some capacity through the conference to be able to participate live. We'll be reviewing the Slurm 21.08 release, as well at a look at the roadmap for Slurm 22.05 and beyond. The remainder of the time will be reserved for live Q+A as we've traditionally done. One note: SC21 has told us that they will not be recording any of the BoFs this year, and they will only be available live through their platform. However, SchedMD will be posting a recording of the Slurm BoF on our YouTube channel at a later point to ensure the broader community has access to it. In addition to the BoF, there will be presentations in the Slurm booth - #1807 - over the course of the week. The tentative schedule is: Tuesday: 11am - Introduction to Slurm 1pm - REST API 3pm - Google Cloud 5pm - Introduction to Slurm Wednesday: 11am - Slurm in the Clouds 1pm - Introduction to Slurm 3pm - REST API 5pm - Introduction to Slurm Thursday: 11am - Introduction to Slurm 1pm - Introduction to Slurm -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support From tim at schedmd.com Tue Nov 16 22:06:22 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 16 Nov 2021 15:06:22 -0700 Subject: [slurm-announce] Slurm version 21.08.4 is now available (CVE-2021-43337) Message-ID: <279b31c2-4ec9-5e4d-0622-1b2a5039de23@schedmd.com> Slurm version 21.08.4 is now available, and includes a series of recent bug fixes, as well as a moderate security fix. Note that this security issue is only present in the 21.08 release series. Slurm 20.11 and older releases are unaffected. SchedMD customers were informed of this issue on November 2nd and provided a fix on request; this process is documented in our security policy. [1] CVE-2021-43337: For sites using the new AccountingStoreFlags=job_script and/or job_env options, an issue was reported with the access control rules in SlurmDBD that will permit users to request job scripts and environment files that they should not have access to. (Scripts/environments are meant to only be accessible by user accounts with administrator privileges, by account coordinators for jobs submitted under their account, and by the user themselves.) Downloads are available at https://www.schedmd.com/downloads.php . Release notes follow below. - Tim [1] https://www.schedmd.com/security.php -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 21.08.4 > ========================== > -- Fix potential deadlock when using PMI v1. > -- Fix tight loop sending DBD_SEND_MULT_JOB_START when the slurmctld has an > issue talking correctly to the DBD. > -- Fix memory leak in step creation. > -- Fix potential deadlock when shutting down slurmctld. > -- Fix regression in 21.08 where multi-node steps that requested MemPerCPU > were not counted against the job's memory allocation on some nodes. > -- Fix issue with select/cons_tres and the partition limit MaxCpusPerNode where > the limit was enforced for one less CPU than the configured value. > -- jobacct_gather/common - compare Pss to Rss after scaling Pss to Rss units. > -- Fix SLURM_NODE_ALIASES in RPC Prolog for batch jobs. > -- Fix regression in 21.08 where slurmd and slurmstepd were not constrained > with CpuSpecList or CoreSpecCount. > -- Fix cloud jobs running without powering up nodes after a reconfig/restart. > -- CVE-2021-43337 - Fix security issue with new AccountingStoreFlags=job_script > and job_env options where users could request scripts and environments they > should not have been permitted to access. From tim at schedmd.com Tue Dec 21 21:00:17 2021 From: tim at schedmd.com (Tim Wickberg) Date: Tue, 21 Dec 2021 14:00:17 -0700 Subject: [slurm-announce] Slurm version 21.08.5 is now available Message-ID: We are pleased to announce the availability of Slurm version 21.08.5. This includes a number of moderate severity fixes since the last maintenance release a month ago. And, as it appears to be _en vogue_ to discuss log4j issues, I'll take a moment to state that Slurm is unaffected by the recent log4j disclosures. Slurm is written in C, does not use log4j, and Slurm's logging subsystems are not vulnerable to the class of issues that have led to those exploits. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support > * Changes in Slurm 21.08.5 > ========================== > -- Fix issue where typeless GRES node updates were not immediately reflected. > -- Fix setting the default scrontab job working directory so that it's the home > of the different user (-u ) and not that of root or SlurmUser editor. > -- Fix stepd not respecting SlurmdSyslogDebug. > -- Fix concurrency issue with squeue. > -- Fix job start time not being reset after launch when job is packed onto > already booting node. > -- Fix updating SLURM_NODE_ALIASES for jobs packed onto powering up nodes. > -- Cray - Fix issues with starting hetjobs. > -- auth/jwks - Print fatal() message when jwks is configured but file could > not be opened. > -- If sacctmgr has an association with an unknown qos as the default qos > print 'UNKN-###' instead of leaving a blank name. > -- Correctly determine task count when giving --cpus-per-gpu, --gpus and > --ntasks-per-node without task count. > -- slurmctld - Fix places where the global last_job_update was not being set > to the time of update when a job's reason and description were updated. > -- slurmctld - Fix case where a job submitted with more than one partition > would not have its reason updated while waiting to start. > -- Fix memory leak in node feature rebooting. > -- Fix time limit permanetly set to 1 minute by backfill for job array tasks > higher than the first with QOS NoReserve flag and PreemptMode configured. > -- Fix sacct -N to show jobs that started in the current second > -- Fix issue on running steps where both SLURM_NTASKS_PER_TRES and > SLURM_NTASKS_PER_GPU are set. > -- Handle oversubscription request correctly when also requesting > --ntasks-per-tres. > -- Correctly detect when a step requests bad gres inside an allocation. > -- slurmstepd - Correct possible deadlock when UnkillableStepTimeout triggers. > -- srun - use maximum number of open files while handling job I/O. > -- Fix writing to Xauthority files on root_squash NFS exports, which was > preventing X11 forwarding from completing setup. > -- Fix regression in 21.08.0rc1 that broke --gres=none. > -- Fix srun --cpus-per-task and --threads-per-core not implicitly setting > --exact. It was meant to work this way in 21.08. > -- Fix regression in 21.08.0 that broke dynamic future nodes. > -- Fix dynamic future nodes remembering active state on restart. > -- Fix powered down nodes getting stuck in COMPLETING+POWERED_DOWN when job is > cancelled before nodes are powering up.