[slurm-users] Multinode MPI job
Frava
fravadona at gmail.com
Thu Mar 28 10:52:40 UTC 2019
I didn't receive the last mail from Mahmood but Marcus is right, Mahmood's
heterogeneous job submission seems to be working now.
Well, separating each pack in the srun command and asking for the correct
number of tasks to be launched for each pack is the way I figured the
heterogeneous jobs worked with SLURM v18.08.0 (I didn't test it with more
recent SLURM versions).
Le jeu. 28 mars 2019 à 08:23, Marcus Wagner <wagner at itc.rwth-aachen.de> a
écrit :
> Hi Mahmood,
>
> On 3/28/19 7:33 AM, Mahmood Naderan wrote:
>
> >srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i
> mos2.rlx.in
>
> Still only one node is running the processes
>
> no, the processes are running as had been asked for.
>
>
> $ squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 755+1 QUARTZ myQE ghatee R 0:47 1 rocks7
> 755+0 QUARTZ myQE ghatee R 0:47 1
> compute-0-2
>
>
> compute-0-2 is the first pack (755+0), it should run 2 tasks
>
> $ rocks run host compute-0-2 "ps aux | grep pw.x"
> ghatee 541 0.1 0.0 582048 7604 ? Sl 02:29 0:00 srun
> --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in
> ghatee 542 0.0 0.0 46452 748 ? S 02:29 0:00 srun
> --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in
> ghatee 559 99.6 0.1 1930560 129728 ? Rl 02:29 0:52
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 560 99.7 0.1 1930560 129720 ? Rl 02:29 0:52
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 590 0.0 0.0 113132 1588 ? Ss 02:30 0:00 bash -c
> ps aux | grep pw.x
> ghatee 629 0.0 0.0 112668 960 ? S 02:30 0:00 grep pw.x
>
>
> process ids 559 and 560
>
>
> rocks7 is the second pack (755+1), it should run 4 tasks
>
> $ rocks run host rocks7 "ps aux | grep pw.x"
> ghatee 16219 99.0 0.1 1930484 127764 ? Rl 10:59 1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 16220 99.1 0.1 1930524 127764 ? Rl 10:59 1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 16221 99.0 0.1 1930484 127760 ? Rl 10:59 1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 16222 99.1 0.1 1930496 127760 ? Rl 10:59 1:00
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 16391 0.0 0.0 316388 26652 pts/16 Sl+ 11:00 0:00
> /opt/rocks/bin/python /opt/rocks/bin/rocks run host rocks7 ps aux | grep
> pw.x
> ghatee 16394 0.0 0.0 113132 1368 pts/16 S+ 11:00 0:00 bash -c
> ps aux | grep pw.x
> ghatee 16396 0.0 0.0 112664 952 pts/16 S+ 11:00 0:00 grep pw.x
>
>
> process ids 16219, 16220, 16221 and 16222
>
> Or did I miss something?
>
>
> Best
> Marcus
>
>
> Regards,
> Mahmood
>
>
>
>
>
> --
> Marcus Wagner, Dipl.-Inf.
>
> IT Center
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383wagner at itc.rwth-aachen.dewww.itc.rwth-aachen.de
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190328/e380de88/attachment-0001.html>
More information about the slurm-users
mailing list