[slurm-users] sbatch : job fail without any output or indication

Adrian Sevcenco Adrian.Sevcenco at spacescience.ro
Tue Feb 4 10:17:06 UTC 2020


Hi! How can i debug a job that fail without any output or indication?
My job that start with sbatch has the following form :

#!/bin/bash
#SBATCH --job-name QCUT_SEV
#SBATCH -p CLUSTER                      # Partition to submit to
#SBATCH --output=%x_%j.out      # File to which STDOUT will be written
#SBATCH --error=%x_%j.err      # File to which STDERR will be written
#module purge all

# Define and create a unique scratch directory for this job
SCRATCH_DIRECTORY=/scratch/workdir_${USER}/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}

# You can copy everything you need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted 
from
cp ${SLURM_SUBMIT_DIR}/* ${SCRATCH_DIRECTORY}/

# This is where the actual work is done.
export VER="vAN-20200115_ROOT6-1"
eval $(/cvmfs/alice.cern.ch/bin/alienv printenv VO_ALICE at AliPhysics::${VER})
./run_analysis

# After the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp -ru ${SCRATCH_DIRECTORY}/* ${SLURM_SUBMIT_DIR}/

# After everything is saved to the home directory, delete the work 
directory to save space on /scratch/workdir
cd /tmp
rm -rf ${SCRATCH_DIRECTORY}

echo "end of ${SLURM_JOB_NAME}"

Running the actual work part (starting with the comment where the work 
is done) and up to (including) ./run_analysis
works locally without problem

Any idea how can i find the reason for :
sacct -X
        JobID    JobName  Partition    Account  AllocCPUS      State 
ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
24500          QCUT_SEV    CLUSTER      local          1     FAILED      1:0

Thank you!!
Adrian




More information about the slurm-users mailing list