<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Well, does it also crash when you run it with two nodes in a normal way (not using heterogeneous jobs) ?</div><div><br></div><div><span class="gmail-im"><div>#!/bin/bash<br>#SBATCH --job-name=myQE_2Nx2MPI<br>#SBATCH --output=big-mem</div><div><span class="gmail-im">#SBATCH --nodes=2<br></span></div><div><span class="gmail-im">#SBATCH --ntasks-per-node=2<br></span></div><div>#SBATCH --mem-per-cpu=16g</div><div>#SBATCH --partition=QUARTZ<br>#SBATCH --account=z5<br></div></span><span class="gmail-im"></span><div>#<br></div><div>srun pw.x -i <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a></div><div><br></div></div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le jeu. 28 mars 2019 à 16:57, Mahmood Naderan <<a href="mailto:mahmood.nt@gmail.com">mahmood.nt@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">BTW, when I manually run on a node, e.g. compute-0-2, I get this output</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">]$ mpirun -np 4 pw.x -i <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br><br> Program PWSCF v.6.2 starts on 28Mar2019 at 11:40:36<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Parallel version (MPI), running on 4 processors<br><br> MPI processes distributed on 1 nodes<br> R & G space division: proc/nbgrp/npool/nimage = 4<br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br><br> Subspace diagonalization in iterative solution of the eigenvalue problem:<br> a serial algorithm will be used<br><br> Found symmetry operation: I + ( 0.0000 0.1667 0.0000)<br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">...</div><div class="gmail_default" style="font-family:tahoma,sans-serif">...</div><div class="gmail_default" style="font-family:tahoma,sans-serif">...</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div><div dir="ltr" class="gmail-m_-2005680079641184008gmail_signature"><div dir="ltr"><font face="tahoma,sans-serif">Regards,<br>Mahmood</font><br><br><br></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 28, 2019 at 8:23 PM Mahmood Naderan <<a href="mailto:mahmood.nt@gmail.com" target="_blank">mahmood.nt@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div style="font-family:tahoma,sans-serif">The run is not consistent. I have manually test "mpirun -np 4 pw.x -i <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a>" on compute-0-2 and rocks7 nodes and it is fine.</div><div style="font-family:tahoma,sans-serif">However, with the script "srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a>" I see some errors in the output file which results in abortion after 60 seconds.</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">The errors are about not finding some files. Although the config file uses absolute path for the intermediate files and files are existed, the errors sound bizarre.</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">compute-0-2<br></div><div style="font-family:tahoma,sans-serif"> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND<br> 3387 ghatee 20 0 1930488 129684 8336 R 100.0 0.2 0:09.71 pw.x<br> 3388 ghatee 20 0 1930476 129700 8336 R 99.7 0.2 0:09.68 pw.x<br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">rocks7<br></div><div style="font-family:tahoma,sans-serif"> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND<br> 5592 ghatee 20 0 1930568 127764 8336 R 100.0 0.2 0:17.29 pw.x<br> 549 ghatee 20 0 116844 3652 1804 S 0.0 0.0 0:00.14 bash<br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">As you can see, 2 tasks are fine on compute-0-2, but there should be 4 tasks on rocks7.</div><div style="font-family:tahoma,sans-serif">Input file contains</div><div style="font-family:tahoma,sans-serif"> outdir = "/home/ghatee/job/2h-unitcell" ,<br> pseudo_dir = "/home/ghatee/q-e-qe-5.4/pseudo/" ,<br> </div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">The output file says</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"> Program PWSCF v.6.2 starts on 28Mar2019 at 11:43:58<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Parallel version (MPI), running on 1 processors<br><br> MPI processes distributed on 1 nodes<br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br><br> Program PWSCF v.6.2 starts on 28Mar2019 at 11:43:58<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Parallel version (MPI), running on 1 processors<br><br> MPI processes distributed on 1 nodes<br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br><br> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Parallel version (MPI), running on 1 processors<br><br> MPI processes distributed on 1 nodes<br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br><br> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Parallel version (MPI), running on 1 processors<br><br> MPI processes distributed on 1 nodes<br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br><br> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58<br><br> This program is part of the open-source Quantum ESPRESSO suite<br> for quantum simulation of materials; please cite<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);<br> URL <a href="http://www.quantum-espresso.org" target="_blank">http://www.quantum-espresso.org</a>",<br> in publications or presentations arising from this work. More details at<br> <a href="http://www.quantum-espresso.org/quote" target="_blank">http://www.quantum-espresso.org/quote</a><br><br> Parallel version (MPI), running on 1 processors<br><br> MPI processes distributed on 1 nodes<br><br> Parallel version (MPI), running on 1 processors<br><br> MPI processes distributed on 1 nodes<br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br> Reading input from <a href="http://mos2.rlx.in" target="_blank">mos2.rlx.in</a><br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card &CELL ignored<br>Warning: card CELL_DYNAMICS = "BFGS" ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br>Warning: card PRESS_CONV_THR = 5.00000E-01 ignored<br>Warning: card / ignored<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized<br>ERROR(FoX)<br>Cannot open file<br>ERROR(FoX)<br>Cannot open file<br><br> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<br> Error in routine read_ncpp (2):<br> pseudo file is empty or wrong<br> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<br><br> stopping ...<br>--------------------------------------------------------------------------<br>MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD<br>with errorcode 1.<br>...</div><div style="font-family:tahoma,sans-serif">...</div><div style="font-family:tahoma,sans-serif">...</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">Verifying that there are 6 "
Parallel version (MPI), running on 1 processors" lines, it seems that it starts normally as I specified in the slurm script. However, I am suspecious that the program is NOT multicore MPI job. It is 6 instances of a serial run and there may be some races during the run.</div><div style="font-family:tahoma,sans-serif">Any thought?<br></div><div style="font-family:tahoma,sans-serif"><br></div><div><div dir="ltr" class="gmail-m_-2005680079641184008gmail-m_-2338728145098259043gmail_signature"><div dir="ltr"><font face="tahoma,sans-serif">Regards,<br>Mahmood</font><br><br><br></div></div></div><br></div></div></div></div></div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 28, 2019 at 3:59 PM Frava <<a href="mailto:fravadona@gmail.com" target="_blank">fravadona@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I didn't receive the last mail from Mahmood but Marcus is right, Mahmood's heterogeneous job submission seems to be working now.</div><div>Well, separating each pack in the srun command and asking for the correct number of tasks to be launched for each pack is the way I figured the heterogeneous jobs worked with SLURM v18.08.0 (I didn't test it with more recent SLURM versions).<br></div></div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
</div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div>