[slurm-users] Problems with sun and TaskProlog

Putnam, Harry Harry.Putnam at ucsf.edu
Fri Feb 11 19:30:33 UTC 2022


Thanks for your reply Bjorn-Helge

This cleared things up for me. I had not understood that we need to use Prolog and Epilog for the TMPDIR stuff because that guarantees it is created at the very beginning of the job and deleted at the very end. Everything now works as expected, thanks so much for your help.

-Harry

On 2/11/22, 1:19 AM, "slurm-users" <slurm-users-bounces at lists.schedmd.com> wrote:
"Putnam, Harry" <Harry.Putnam at ucsf.edu<mailto:Harry.Putnam at ucsf.edu>> writes:

> /opt/slurm/task_epilog
>
> #!/bin/bash
> mytmpdir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID
> rm -Rf $mytmpdir
> exit;

This might not be the reason for what you observe, but I believe
deleting the scratch dir in the task epilog is not a good idea.  The
task epilog is run after every "srun" or "mpirun" inside a job, which
means that the scratch dir will be created and deleted for each job
step.  On our systems, we create the scratch dir in the (slurmd) Prolog,
set the environment variable in the TaskProlog, and delete the dir in
the (slurmd) Epilog.  That way the dir is just created and deleted once.

> I am not sure I understand what constitutes a job step.

In practice, every run of srun or mpirun creates a job step, and the job
script itself counts as a job step.

--
B/H

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220211/67e5d82a/attachment-0003.htm>


More information about the slurm-users mailing list