<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto">I think it's related to the job step launch semantic change introduced at 20.11.0, which has been reverted since 20.11.3, see <div><a href="https://www.schedmd.com/news.php">https://www.schedmd.com/news.php</a> For details.</div><div><br></div><div>Cheers,</div><div>Angelos <br><div dir="ltr">(Sent from mobile, please pardon me for typos and cursoriness.)</div><div dir="ltr"><br><blockquote type="cite">26/2/2021 9:07、Volker Blum <volker.blum@duke.edu>のメール:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><span>Hi, </span><br><span></span><br><span>I am testing slurm 20.11.2 on a local cluster together with Intel MPI 2018.4.274 .</span><br><span></span><br><span></span><br><span>1) On a single node (20 physical cores) and executed manually (no slurm), a particular application runs fine using Intel’s mpirun, execution time for this example: 8.505 s (wall clock).</span><br><span></span><br><span>(this is a straight MPI application, no complications)</span><br><span></span><br><span></span><br><span>2) Using slurm and Intel’s mpirun through a queue / batch script, </span><br><span></span><br><span>#SBATCH --ntasks-per-node=20                                                                                                                                                             </span><br><span>…</span><br><span>mpirun -n 20 $bin > file.out</span><br><span></span><br><span>the same job runs correcty but takes 121.735 s (wall clock!)</span><br><span></span><br><span></span><br><span>3) After some considerable searching, a partial fix is </span><br><span></span><br><span>#SBATCH --ntasks-per-node=20                                                                                                                                                             </span><br><span>...</span><br><span>export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0</span><br><span>srun --cpu-bind=cores -n 20 $bin > file.out</span><br><span></span><br><span>can bring down the execution time to 13.482 s</span><br><span></span><br><span></span><br><span>4) After changing</span><br><span></span><br><span>#SBATCH --ntasks-per-node=20                                                                                                                                                             </span><br><span>#SBATCH --cpus-per-task=2                                                                                                                                                                </span><br><span>...</span><br><span>export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0</span><br><span>srun --cpu-bind=cores -n 20 $bin > file.out</span><br><span></span><br><span>finally, the time is:  8.480 s</span><br><span></span><br><span>This timing is as it should be, but at the price of pretending that an application is multithreaded when it is, in fact, not multithreaded. </span><br><span></span><br><span>***</span><br><span></span><br><span>Is it possible to just keep Intel MPI defaults intact when using its mpirun in a slurm batch script?</span><br><span></span><br><span>Best wishes</span><br><span>Volker</span><br><span></span><br><span>Volker Blum</span><br><span>Associate Professor</span><br><span>Ab Initio Materials Simulations</span><br><span>Thomas Lord Department of Mechanical Engineering and Materials Science</span><br><span>Duke University</span><br><span>https://aims.pratt.duke.edu</span><br><span></span><br><span>volker.blum@duke.edu</span><br><span>Twitter: Aimsduke</span><br><span></span><br><span>Office: 4308 Chesterfield Building</span><br><span></span><br><span></span><br><span></span><br><span></span><br><span></span><br><span></span><br><span></span><br><span></span><br><span></span><br></div></blockquote></div></body></html>