[slurm-users] Multinode MPI job
Mahmood Naderan
mahmood.nt at gmail.com
Wed Mar 27 19:10:28 UTC 2019
OK. The two different partitions I saw was due to not specifying partition
name for the first set (before packjob). Here is a better script
#!/bin/bash
#SBATCH --job-name=myQE
#SBATCH --output=big-mem
#SBATCH --mem-per-cpu=16g --ntasks=2
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
#SBATCH packjob
#SBATCH --mem-per-cpu=10g --ntasks=4
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
srun pw.x -i mos2.rlx.in
One node should run 2 processes (32GB total) and one other node should run
4 process (40GB total).
The queue looks like
$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
747+1 QUARTZ myQE ghatee R 0:02 1 rocks7
747+0 QUARTZ myQE ghatee R 0:02 1
compute-0-2
As I checked the node, one 2 processes are running on compute-0-2 (first
set before packjob). But there is no processes on rocks7.
$ rocks run host compute-0-2 "ps aux | grep pw.x"
ghatee 30234 0.0 0.0 251208 4996 ? Sl 15:04 0:00 srun pw.x
-i mos2.rlx.in
ghatee 30235 0.0 0.0 46452 748 ? S 15:04 0:00 srun pw.x
-i mos2.rlx.in
ghatee 30247 99.8 0.1 1930484 129696 ? Rl 15:04 4:31
/home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
ghatee 30248 99.8 0.1 1930488 129704 ? Rl 15:04 4:31
/home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
ghatee 30352 0.0 0.0 113132 1592 ? Ss 15:09 0:00 bash -c ps
aux | grep pw.x
ghatee 30381 0.0 0.0 112664 960 ? S 15:09 0:00 grep pw.x
$ rocks run host rocks7 "ps aux | grep pw.x"
ghatee 17141 0.0 0.0 316476 26632 pts/21 Sl+ 23:39 0:00
/opt/rocks/bin/python /opt/rocks/bin/rocks run host rocks7 ps aux | grep
pw.x
ghatee 17143 0.0 0.0 113132 1364 pts/21 S+ 23:39 0:00 bash -c ps
aux | grep pw.x
ghatee 17145 0.0 0.0 112664 960 pts/21 R+ 23:39 0:00 grep pw.x
Any idea?
It seems that the mpirun I have is not compatible with the hetro
configuration because the SBATCH parameters are straight forward.
Regards,
Mahmood
On Wed, Mar 27, 2019 at 11:03 PM Christopher Samuel <chris at csamuel.org>
wrote:
> On 3/27/19 11:29 AM, Mahmood Naderan wrote:
>
> > Thank you very much. you are right. I got it.
>
> Cool, good to hear.
>
> I'd love to hear whether you get heterogenous MPI jobs working too!
>
> All the best,
> Chris
> --
> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190327/52895db4/attachment-0001.html>
More information about the slurm-users
mailing list