[slurm-users] Multinode MPI job

Frava fravadona at gmail.com
Thu Mar 28 10:52:40 UTC 2019


I didn't receive the last mail from Mahmood but Marcus is right, Mahmood's
heterogeneous job submission seems to be working now.
Well, separating each pack in the srun command and asking for the correct
number of tasks to be launched for each pack is the way I figured the
heterogeneous jobs worked with SLURM v18.08.0 (I didn't test it with more
recent SLURM versions).

Le jeu. 28 mars 2019 à 08:23, Marcus Wagner <wagner at itc.rwth-aachen.de> a
écrit :

> Hi Mahmood,
>
> On 3/28/19 7:33 AM, Mahmood Naderan wrote:
>
> >srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i
> mos2.rlx.in
>
> Still only one node is running the processes
>
> no, the processes are running as had been asked for.
>
>
> $ squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>              755+1    QUARTZ     myQE   ghatee  R       0:47      1 rocks7
>              755+0    QUARTZ     myQE   ghatee  R       0:47      1
> compute-0-2
>
>
> compute-0-2 is the first pack (755+0), it should run 2 tasks
>
> $ rocks run host compute-0-2  "ps aux | grep pw.x"
> ghatee     541  0.1  0.0 582048  7604 ?        Sl   02:29   0:00 srun
> --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in
> ghatee     542  0.0  0.0  46452   748 ?        S    02:29   0:00 srun
> --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in
> ghatee     559 99.6  0.1 1930560 129728 ?      Rl   02:29   0:52
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee     560 99.7  0.1 1930560 129720 ?      Rl   02:29   0:52
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee     590  0.0  0.0 113132  1588 ?        Ss   02:30   0:00 bash -c
> ps aux | grep pw.x
> ghatee     629  0.0  0.0 112668   960 ?        S    02:30   0:00 grep pw.x
>
>
> process ids 559 and 560
>
>
> rocks7 is the second pack (755+1), it should run 4 tasks
>
> $ rocks run host rocks7  "ps aux | grep pw.x"
> ghatee   16219 99.0  0.1 1930484 127764 ?      Rl   10:59   1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee   16220 99.1  0.1 1930524 127764 ?      Rl   10:59   1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee   16221 99.0  0.1 1930484 127760 ?      Rl   10:59   1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee   16222 99.1  0.1 1930496 127760 ?      Rl   10:59   1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee   16391  0.0  0.0 316388 26652 pts/16   Sl+  11:00   0:00
> /opt/rocks/bin/python /opt/rocks/bin/rocks run host rocks7 ps aux | grep
> pw.x
> ghatee   16394  0.0  0.0 113132  1368 pts/16   S+   11:00   0:00 bash -c
> ps aux | grep pw.x
> ghatee   16396  0.0  0.0 112664   952 pts/16   S+   11:00   0:00 grep pw.x
>
>
> process ids 16219, 16220, 16221 and 16222
>
> Or did I miss something?
>
>
> Best
> Marcus
>
>
> Regards,
> Mahmood
>
>
>
>
>
> --
> Marcus Wagner, Dipl.-Inf.
>
> IT Center
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383wagner at itc.rwth-aachen.dewww.itc.rwth-aachen.de
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190328/e380de88/attachment-0001.html>


More information about the slurm-users mailing list