[slurm-users] Multinode MPI job
Mahmood Naderan
mahmood.nt at gmail.com
Thu Mar 28 15:55:19 UTC 2019
BTW, when I manually run on a node, e.g. compute-0-2, I get this output
]$ mpirun -np 4 pw.x -i mos2.rlx.in
Program PWSCF v.6.2 starts on 28Mar2019 at 11:40:36
This program is part of the open-source Quantum ESPRESSO suite
for quantum simulation of materials; please cite
"P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
"P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
URL http://www.quantum-espresso.org",
in publications or presentations arising from this work. More details
at
http://www.quantum-espresso.org/quote
Parallel version (MPI), running on 4 processors
MPI processes distributed on 1 nodes
R & G space division: proc/nbgrp/npool/nimage = 4
Reading input from mos2.rlx.in
Warning: card &CELL ignored
Warning: card CELL_DYNAMICS = "BFGS" ignored
Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
Warning: card / ignored
Current dimensions of program PWSCF are:
Max number of different atomic species (ntypx) = 10
Max number of k-points (npk) = 40000
Max angular momentum in pseudopotentials (lmaxx) = 3
file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S
renormalized
Subspace diagonalization in iterative solution of the eigenvalue
problem:
a serial algorithm will be used
Found symmetry operation: I + ( 0.0000 0.1667 0.0000)
...
...
...
Regards,
Mahmood
On Thu, Mar 28, 2019 at 8:23 PM Mahmood Naderan <mahmood.nt at gmail.com>
wrote:
> The run is not consistent. I have manually test "mpirun -np 4 pw.x -i
> mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine.
> However, with the script "srun --pack-group=0 --ntasks=2 : --pack-group=1
> --ntasks=4 pw.x -i mos2.rlx.in" I see some errors in the output file
> which results in abortion after 60 seconds.
>
> The errors are about not finding some files. Although the config file uses
> absolute path for the intermediate files and files are existed, the errors
> sound bizarre.
>
>
> compute-0-2
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3387 ghatee 20 0 1930488 129684 8336 R 100.0 0.2 0:09.71 pw.x
> 3388 ghatee 20 0 1930476 129700 8336 R 99.7 0.2 0:09.68 pw.x
>
>
>
> rocks7
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 5592 ghatee 20 0 1930568 127764 8336 R 100.0 0.2 0:17.29 pw.x
> 549 ghatee 20 0 116844 3652 1804 S 0.0 0.0 0:00.14 bash
>
>
>
> As you can see, 2 tasks are fine on compute-0-2, but there should be 4
> tasks on rocks7.
> Input file contains
> outdir = "/home/ghatee/job/2h-unitcell" ,
> pseudo_dir = "/home/ghatee/q-e-qe-5.4/pseudo/" ,
>
>
> The output file says
>
> Program PWSCF v.6.2 starts on 28Mar2019 at 11:43:58
>
> This program is part of the open-source Quantum ESPRESSO suite
> for quantum simulation of materials; please cite
> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
> URL http://www.quantum-espresso.org",
> in publications or presentations arising from this work. More details
> at
> http://www.quantum-espresso.org/quote
>
> Parallel version (MPI), running on 1 processors
>
> MPI processes distributed on 1 nodes
> Reading input from mos2.rlx.in
> Warning: card &CELL ignored
> Warning: card CELL_DYNAMICS = "BFGS" ignored
> Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
> Warning: card / ignored
>
> Program PWSCF v.6.2 starts on 28Mar2019 at 11:43:58
>
> This program is part of the open-source Quantum ESPRESSO suite
> for quantum simulation of materials; please cite
> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
> URL http://www.quantum-espresso.org",
> in publications or presentations arising from this work. More details
> at
> http://www.quantum-espresso.org/quote
>
> Parallel version (MPI), running on 1 processors
>
> MPI processes distributed on 1 nodes
> Reading input from mos2.rlx.in
> Warning: card &CELL ignored
> Warning: card CELL_DYNAMICS = "BFGS" ignored
> Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
> Warning: card / ignored
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
>
> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58
>
> This program is part of the open-source Quantum ESPRESSO suite
> for quantum simulation of materials; please cite
> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
> URL http://www.quantum-espresso.org",
> in publications or presentations arising from this work. More details
> at
> http://www.quantum-espresso.org/quote
>
> Parallel version (MPI), running on 1 processors
>
> MPI processes distributed on 1 nodes
> Reading input from mos2.rlx.in
> Warning: card &CELL ignored
> Warning: card CELL_DYNAMICS = "BFGS" ignored
> Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
> Warning: card / ignored
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
>
> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58
>
> This program is part of the open-source Quantum ESPRESSO suite
> for quantum simulation of materials; please cite
> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
> URL http://www.quantum-espresso.org",
> in publications or presentations arising from this work. More details
> at
> http://www.quantum-espresso.org/quote
>
> Parallel version (MPI), running on 1 processors
>
> MPI processes distributed on 1 nodes
> Reading input from mos2.rlx.in
> Warning: card &CELL ignored
> Warning: card CELL_DYNAMICS = "BFGS" ignored
> Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
> Warning: card / ignored
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
>
> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58
>
> This program is part of the open-source Quantum ESPRESSO suite
> for quantum simulation of materials; please cite
> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
> URL http://www.quantum-espresso.org",
> in publications or presentations arising from this work. More details
> at
> http://www.quantum-espresso.org/quote
>
> Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58
>
> This program is part of the open-source Quantum ESPRESSO suite
> for quantum simulation of materials; please cite
> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
> "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
> URL http://www.quantum-espresso.org",
> in publications or presentations arising from this work. More details
> at
> http://www.quantum-espresso.org/quote
>
> Parallel version (MPI), running on 1 processors
>
> MPI processes distributed on 1 nodes
>
> Parallel version (MPI), running on 1 processors
>
> MPI processes distributed on 1 nodes
> Reading input from mos2.rlx.in
> Reading input from mos2.rlx.in
> Warning: card &CELL ignored
> Warning: card CELL_DYNAMICS = "BFGS" ignored
> Warning: card &CELL ignored
> Warning: card CELL_DYNAMICS = "BFGS" ignored
> Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
> Warning: card / ignored
> Warning: card PRESS_CONV_THR = 5.00000E-01 ignored
> Warning: card / ignored
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s)
> 4S renormalized
> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s)
> 4S renormalized
> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s)
> 4S renormalized
> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s)
> 4S renormalized
> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s)
> 4S renormalized
> file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s)
> 4S renormalized
> ERROR(FoX)
> Cannot open file
> ERROR(FoX)
> Cannot open file
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Error in routine read_ncpp (2):
> pseudo file is empty or wrong
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> stopping ...
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
> ...
> ...
> ...
>
>
>
>
>
> Verifying that there are 6 " Parallel version (MPI), running on 1
> processors" lines, it seems that it starts normally as I specified in the
> slurm script. However, I am suspecious that the program is NOT multicore
> MPI job. It is 6 instances of a serial run and there may be some races
> during the run.
> Any thought?
>
> Regards,
> Mahmood
>
>
>
>
> On Thu, Mar 28, 2019 at 3:59 PM Frava <fravadona at gmail.com> wrote:
>
>> I didn't receive the last mail from Mahmood but Marcus is right,
>> Mahmood's heterogeneous job submission seems to be working now.
>> Well, separating each pack in the srun command and asking for the correct
>> number of tasks to be launched for each pack is the way I figured the
>> heterogeneous jobs worked with SLURM v18.08.0 (I didn't test it with more
>> recent SLURM versions).
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190328/32244f66/attachment-0001.html>
More information about the slurm-users
mailing list