[slurm-users] Problems with sun and TaskProlog

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Fri Feb 11 09:15:33 UTC 2022


"Putnam, Harry" <Harry.Putnam at ucsf.edu> writes:

> /opt/slurm/task_epilog
>
> #!/bin/bash
> mytmpdir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID
> rm -Rf $mytmpdir
> exit;

This might not be the reason for what you observe, but I believe
deleting the scratch dir in the task epilog is not a good idea.  The
task epilog is run after every "srun" or "mpirun" inside a job, which
means that the scratch dir will be created and deleted for each job
step.  On our systems, we create the scratch dir in the (slurmd) Prolog,
set the environment variable in the TaskProlog, and delete the dir in
the (slurmd) Epilog.  That way the dir is just created and deleted once.

> I am not sure I understand what constitutes a job step.

In practice, every run of srun or mpirun creates a job step, and the job
script itself counts as a job step.

-- 
B/H
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220211/570d95c7/attachment.sig>


More information about the slurm-users mailing list