[slurm-users] $TMPDIR does not honor "TmpFS"

Chris Samuel chris at csamuel.org
Thu Nov 22 04:22:14 MST 2018


On Thursday, 22 November 2018 9:26:13 PM AEDT Christoph BrĂ¼ning wrote:

> Hi Chris,

Hi Christoph!

[...]
> I was wondering if constantly making and deleting XFS projects has a
> considerable impact on performance and stability. So I'd be glad if you
> could share some of your experience with that setup.

It's been pretty transparent to the users, the local disks on the nodes are 
only used for local scratch (the root filesystem is mounted from Lustre with 
some neat hacks to OneSIS and the kernel from our Lustre guru) so there's very 
little competition for the SSDs.

> Also, would you mind providing access to your prolog and epilog scripts?

Attached!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
-------------- next part --------------
#!/bin/bash

if [ "${SLURM_RESTART_COUNT}" == "" ]; then
       SLURM_RESTART_COUNT=0
fi

JOBSCRATCH=/jobfs/local/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}

# Create a temporary directory and set an XFS quota on it to match the requested --tmp (or 100MB if not set)
if [ -d ${JOBSCRATCH} ]; then
	exec > >(tee "/tmp/quota.log") 2>&1
	set -x
	QUOTA=$(/apps/slurm/latest/bin/scontrol show JobId=${SLURM_JOB_ID} | egrep MinTmpDiskNode=[0-9] | awk -F= '{print $NF}')
	if [ "${QUOTA}" == "0" ]; then
		QUOTA=100M
	fi
	/usr/sbin/xfs_quota -x -c "project -s -p ${JOBSCRATCH} ${SLURM_JOB_ID}" /jobfs/local
	/usr/sbin/xfs_quota -x -c "limit -p bhard=${QUOTA} ${SLURM_JOB_ID}" /jobfs/local

	# Set up a directory to be used as ${JOBFS}
	/bin/mkdir ${JOBSCRATCH}/var_tmp/jobfs
	/bin/chown --reference=${JOBSCRATCH}/var_tmp/ ${JOBSCRATCH}/var_tmp/jobfs -v
	set +x
else
	echo "$(date): TMPDIR ${JOBSCRATCH} not there" >> /jobfs/local/slurm/slurmdprologfail.txt
fi


exit 0
-------------- next part --------------
#!/bin/bash
#
# Remove job's scratch directory

if [ "${SLURM_RESTART_COUNT}" == "" ]; then
       SLURM_RESTART_COUNT=0
fi

JOBSCRATCH=/jobfs/local/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}
SHMSCRATCH=/dev/shm/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}

# Delete the scratch directory for the job (as long as it exists)
test -d ${JOBSCRATCH} && rm -rf ${JOBSCRATCH}
test -d ${SHMSCRATCH} && rm -rf ${SHMSCRATCH}

# Exit OK here to prevent the node getting marked down.

exit 0


More information about the slurm-users mailing list