[slurm-users] Multi-node job failure
Chris Samuel
chris at csamuel.org
Fri Dec 13 06:15:10 UTC 2019
On 11/12/19 8:05 am, Chris Woelkers - NOAA Federal wrote:
> Partial progress. The scientist that developed the model took a look at
> the output and found that instead of one model run being ran in parallel
> srun had ran multiple instances of the model, one per thread, which for
> this test was 110 threads.
This sounds like MVAPICH isn't built to support Slurm, from the Slurm
MPI guide you need to build it with this to enable Slurm support (and of
course add any other options you were using):
./configure --with-pmi=pmi2 --with-pm=slurm
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list