[slurm-users] job_container/tmpfs and srun.

Phill Harvey-Smith p.harvey-smith at warwick.ac.uk
Tue Jan 9 11:30:35 UTC 2024


Hi all,

On our setup we are using job_container/tmpfs to give each job it's own 
temp space. Since our compute nodes have reasonably sized disks for 
tasks that do a lot of disk I/O on user's data we have asked users to 
copy their data to the local disk at the beginning of the task and (if 
needed) copy it back at the end. This saves lots of NFS thrashing 
slowing down both the task and the NFS servers.

However some of our users are having problems with this, their initial 
sbatch script will create a temp directory in their private /tmp copy 
their data to it and then try to srun a program. The srun will fall over 
as it doesn't seem to have have access to the copied data. I suspect 
this is because the srun task is getting it's own private /tmp.

So my question is, is there a way to have the srun task inherit the /tmp 
of the initial sbatch?

I'll include a sample of the script our user is using below.

If any further information is required please feel free to ask.

Cheers.

Phill.


#!/usr/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:00:10
#SBATCH --mem-per-cpu=3999
#SBATCH --output=script_out.log
#SBATCH --error=script_error.log

# The above options puts the STDOUT and STDERR of sbatch in
# log files prefixed with 'script_'.

# Create a randomly-named directory under /tmp
jobtmpdir=$(mktemp -d)

# Register a function to try and cleanup in case of job failure
cleanup_handler()
{
     echo "Cleaning up ${jobtmpdir}"
     rm -rf ${jobtmpdir}
}
trap 'cleanup_handler' SIGTERM EXIT

# Change working directory to this directory
cd ${jobtmpdir}

# Copy the executable and input files from
# where the job was submitted to the temporary directory.
cp ${SLURM_SUBMIT_DIR}/a.out .
cp ${SLURM_SUBMIT_DIR}/input.txt .

# Run the executable, handling the collection of stdout
# and stderr ourselves by redirecting to file
srun ./a.out 2> task_error.log > task_out.log

# Copy output data back to the submit directory.
cp output.txt ${SLURM_SUBMIT_DIR}
cp task_out.log ${SLURM_SUBMIT_DIR}
cp task_error.log ${SLURM_SUBMIT_DIR}

# Cleanup
cd ${SLURM_SUBMIT_DIR}
cleanup_handler



More information about the slurm-users mailing list