[slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

Levi Morrison levi_morrison at byu.edu
Thu Jun 6 17:21:14 UTC 2019


Slurm 19.05 removed support for `--cpu_bind`, which is what all released 
versions of OpenMPI are using when they call into srun. This issue was 
fixed 24 days ago in [OpenMPI's git repo][1].

This means all OpenMPI programs that end up calling `srun` on Slurm 
19.05 will fail.

This enormous amount of breakage for such a minor "gain" seems unwise. I 
think this [change][2] should be backed out and converted to a warning 
message to allow time for the OpenMPI changes to be backported, 
released, and adopted. Theoretically they were given time with the 17.11 
release (I think?) but since it's only just landed...

Levi Morrison
Brigham Young University

   [1]: 
https://github.com/open-mpi/ompi/commit/7dad74032e30259506da7fa582dd8c4351e6e0a1
   [2]: 
https://github.com/SchedMD/slurm/commit/d78af893e4a60e933a2319b0c36a0e40c7dd1b02



More information about the slurm-users mailing list