[slurm-users] Upcoming Slurm 20.11.3 release will revert to older step launch semantics
Tim Wickberg
tim at schedmd.com
Fri Jan 8 19:12:52 UTC 2021
Hey folks -
As some of you have observed, one of the changes made in the Slurm 20.11
release was to the semantics for job steps launched through the 'srun'
command. This also inadvertently impacts many MPI releases that use srun
underneath their own mpiexec/mpirun command.
For 20.11.{0,1,2} releases, the default behavior for srun was changed to
limiting the step to only exactly what was requested by the options
given to srun. This change was equivalent to Slurm setting the
--exclusive option by default on all job steps. Job steps desiring all
resources on the node needed to explicitly request them through the new
'--whole' option.
In the upcoming 20.11.3 release, we will be reverting to the 20.02 and
older behavior of assigning all resources on a node to the job step by
default.
This is a major behavioral change, and not one we're making lightly, but
is being done in the interest of restoring compatibility with the large
number of existing Open MPI (and other MPI flavors) and job scripts that
exist in production, and to remove what has proven to be a significant
hurdle in moving to the new release.
Please note that one change to step launch remains - by default, in
20.11 steps are no longer permitted to overlap on the resources they
have been assigned. If that behavior is desired, all steps must
explicitly opt-in through the newly added '--overlap' option.
Further details and a full explanation of the issue can be found at:
https://bugs.schedmd.com/show_bug.cgi?id=10383#c63
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
More information about the slurm-users
mailing list