[slurm-users] siesta jobs with slurm, an issue

Bill Barth bbarth at tacc.utexas.edu
Sun Jul 22 10:14:49 MDT 2018


That doesn't look like a slurm problem to me necessarily. Looks like SIESTA quit of its own volition (thus the call to MPI_ABORT()). I suggest you ask your local site support to take a look or go to the SIESTA developers. I doubt you'll find any SIESTA experts here to help you. 

All I can suggest is to check that all the paths you have provided SIESTA are correct (the path to the executable is clearly fine b/c SIESTA starts, but can it fine prime.fdf?). Otherwise start with your local support team.

Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 7/22/18, 11:08 AM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:

    Hi,
    I don't know why siesta jobs are aborted by the slurm.
    
    
    [mahmood at rocks7 sie]$ cat slurm_script.sh
    #!/bin/bash
    #SBATCH --output=siesta.out
    #SBATCH --job-name=siesta
    #SBATCH --ntasks=8
    #SBATCH --mem=4G
    #SBATCH --account=z3
    #SBATCH --partition=EMERALD
    mpirun /share/apps/chem/siesta-4.0.2/spar/siesta prime.fdf prime.out
    [mahmood at rocks7 sie]$ sbatch slurm_script.sh
    Submitted batch job 783
    [mahmood at rocks7 sie]$ squeue --job 783
                 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    [mahmood at rocks7 sie]$ cat siesta.out
    Siesta Version  : v4.0.2
    Architecture    : x86_64-unknown-linux-gnu--unknown
    Compiler version: GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
    Compiler flags  : mpifort -g -O2
    PP flags        : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
    PARALLEL version
    
    * Running on    8 nodes in parallel
    >> Start of run:  22-JUL-2018  20:33:36
    
                               ***********************
                               *  WELCOME TO SIESTA  *
                               ***********************
    
    reinit: Reading from standard input
    ************************** Dump of input data file ****************************
    ************************** End of input data file *****************************
    
    reinit: -----------------------------------------------------------------------
    reinit: System Name:
    reinit: -----------------------------------------------------------------------
    reinit: System Label: siesta
    reinit: -----------------------------------------------------------------------
    No species found!!!
    Stopping Program from Node:    0
    
    initatom: Reading input for the pseudopotentials and atomic orbitals ----------
    No species found!!!
    Stopping Program from Node:    0
    --------------------------------------------------------------------------
    MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
    with errorcode 1.
    
    NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
    You may or may not see output from other processes, depending on
    exactly when Open MPI kills them.
    --------------------------------------------------------------------------
    [mahmood at rocks7 sie]$
    
    
    
    
    
    
    
    However, I am able to run that command with "-np 4" on the head node. So, I don't know is there any problem with the compute node or something else.
    
    
    Any idea?
    
    
    Regards,
    Mahmood
    
    
    
    
    
    
    



More information about the slurm-users mailing list