[slurm-users] siesta jobs with slurm, an issue
Bill Barth
bbarth at tacc.utexas.edu
Sun Jul 22 10:14:49 MDT 2018
That doesn't look like a slurm problem to me necessarily. Looks like SIESTA quit of its own volition (thus the call to MPI_ABORT()). I suggest you ask your local site support to take a look or go to the SIESTA developers. I doubt you'll find any SIESTA experts here to help you.
All I can suggest is to check that all the paths you have provided SIESTA are correct (the path to the executable is clearly fine b/c SIESTA starts, but can it fine prime.fdf?). Otherwise start with your local support team.
Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
On 7/22/18, 11:08 AM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:
Hi,
I don't know why siesta jobs are aborted by the slurm.
[mahmood at rocks7 sie]$ cat slurm_script.sh
#!/bin/bash
#SBATCH --output=siesta.out
#SBATCH --job-name=siesta
#SBATCH --ntasks=8
#SBATCH --mem=4G
#SBATCH --account=z3
#SBATCH --partition=EMERALD
mpirun /share/apps/chem/siesta-4.0.2/spar/siesta prime.fdf prime.out
[mahmood at rocks7 sie]$ sbatch slurm_script.sh
Submitted batch job 783
[mahmood at rocks7 sie]$ squeue --job 783
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[mahmood at rocks7 sie]$ cat siesta.out
Siesta Version : v4.0.2
Architecture : x86_64-unknown-linux-gnu--unknown
Compiler version: GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Compiler flags : mpifort -g -O2
PP flags : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
PARALLEL version
* Running on 8 nodes in parallel
>> Start of run: 22-JUL-2018 20:33:36
***********************
* WELCOME TO SIESTA *
***********************
reinit: Reading from standard input
************************** Dump of input data file ****************************
************************** End of input data file *****************************
reinit: -----------------------------------------------------------------------
reinit: System Name:
reinit: -----------------------------------------------------------------------
reinit: System Label: siesta
reinit: -----------------------------------------------------------------------
No species found!!!
Stopping Program from Node: 0
initatom: Reading input for the pseudopotentials and atomic orbitals ----------
No species found!!!
Stopping Program from Node: 0
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[mahmood at rocks7 sie]$
However, I am able to run that command with "-np 4" on the head node. So, I don't know is there any problem with the compute node or something else.
Any idea?
Regards,
Mahmood
More information about the slurm-users
mailing list