[slurm-users] sbatch : job fail without any output or indication
Adrian Sevcenco
Adrian.Sevcenco at spacescience.ro
Tue Feb 4 10:17:06 UTC 2020
Hi! How can i debug a job that fail without any output or indication?
My job that start with sbatch has the following form :
#!/bin/bash
#SBATCH --job-name QCUT_SEV
#SBATCH -p CLUSTER # Partition to submit to
#SBATCH --output=%x_%j.out # File to which STDOUT will be written
#SBATCH --error=%x_%j.err # File to which STDERR will be written
#module purge all
# Define and create a unique scratch directory for this job
SCRATCH_DIRECTORY=/scratch/workdir_${USER}/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# You can copy everything you need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted
from
cp ${SLURM_SUBMIT_DIR}/* ${SCRATCH_DIRECTORY}/
# This is where the actual work is done.
export VER="vAN-20200115_ROOT6-1"
eval $(/cvmfs/alice.cern.ch/bin/alienv printenv VO_ALICE at AliPhysics::${VER})
./run_analysis
# After the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp -ru ${SCRATCH_DIRECTORY}/* ${SLURM_SUBMIT_DIR}/
# After everything is saved to the home directory, delete the work
directory to save space on /scratch/workdir
cd /tmp
rm -rf ${SCRATCH_DIRECTORY}
echo "end of ${SLURM_JOB_NAME}"
Running the actual work part (starting with the comment where the work
is done) and up to (including) ./run_analysis
works locally without problem
Any idea how can i find the reason for :
sacct -X
JobID JobName Partition Account AllocCPUS State
ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
24500 QCUT_SEV CLUSTER local 1 FAILED 1:0
Thank you!!
Adrian
More information about the slurm-users
mailing list