[slurm-users] Upcoming Slurm 20.11.3 release will revert to older step launch semantics

Tim Wickberg tim at schedmd.com
Fri Jan 8 19:12:52 UTC 2021


Hey folks -

As some of you have observed, one of the changes made in the Slurm 20.11 
release was to the semantics for job steps launched through the 'srun' 
command. This also inadvertently impacts many MPI releases that use srun 
underneath their own mpiexec/mpirun command.

For 20.11.{0,1,2} releases, the default behavior for srun was changed to 
limiting the step to only exactly what was requested by the options 
given to srun. This change was equivalent to Slurm setting the 
--exclusive option by default on all job steps. Job steps desiring all 
resources on the node needed to explicitly request them through the new 
'--whole' option.

In the upcoming 20.11.3 release, we will be reverting to the 20.02 and 
older behavior of assigning all resources on a node to the job step by 
default.

This is a major behavioral change, and not one we're making lightly, but 
is being done in the interest of restoring compatibility with the large 
number of existing Open MPI (and other MPI flavors) and job scripts that 
exist in production, and to remove what has proven to be a significant 
hurdle in moving to the new release.

Please note that one change to step launch remains - by default, in 
20.11 steps are no longer permitted to overlap on the resources they 
have been assigned. If that behavior is desired, all steps must 
explicitly opt-in through the newly added '--overlap' option.

Further details and a full explanation of the issue can be found at:
https://bugs.schedmd.com/show_bug.cgi?id=10383#c63

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support



More information about the slurm-users mailing list