[slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05
Levi Morrison
levi_morrison at byu.edu
Thu Jun 6 17:21:14 UTC 2019
Slurm 19.05 removed support for `--cpu_bind`, which is what all released
versions of OpenMPI are using when they call into srun. This issue was
fixed 24 days ago in [OpenMPI's git repo][1].
This means all OpenMPI programs that end up calling `srun` on Slurm
19.05 will fail.
This enormous amount of breakage for such a minor "gain" seems unwise. I
think this [change][2] should be backed out and converted to a warning
message to allow time for the OpenMPI changes to be backported,
released, and adopted. Theoretically they were given time with the 17.11
release (I think?) but since it's only just landed...
Levi Morrison
Brigham Young University
[1]:
https://github.com/open-mpi/ompi/commit/7dad74032e30259506da7fa582dd8c4351e6e0a1
[2]:
https://github.com/SchedMD/slurm/commit/d78af893e4a60e933a2319b0c36a0e40c7dd1b02
More information about the slurm-users
mailing list