[slurm-users] Job Step Output Delay

Maria Semple maria at rstudio.com
Thu Feb 11 00:14:19 UTC 2021


The larger cluster is using NFS. I can see how that could be related to the
difference of behaviours between the clusters.

 The buffering behaviour is the same if I tail the file from the node
running the job. The only thing that seems to change the behaviour is
whether I use srun to create a job step or not.

On Wed, Feb 10, 2021 at 4:09 PM Aaron Jackson <
Aaron.Jackson at nottingham.ac.uk> wrote:

> Is it being written to NFS? You say on your local dev cluster it's a
> single node. Is it also the login node as well as compute? In that case
> I guess there is no NFS. Larger cluster will be using some sort of
> shared storage, so whichever shared file system you are using likely has
> caching.
>
> If you are able to connect directly to the node which is running the
> job, you can try tailing from there. It'll likely update immediately if
> what I said above is the case.
>
> Cheers,
> Aaron
>
>
> On  9 February 2021 at 23:47 GMT, Maria Semple wrote:
>
> > Hello all,
> >
> > I've noticed an odd behaviour with job steps in some Slurm environments.
> > When a script is launched directly as a job, the output is written to
> file
> > immediately. When the script is launched as a step in a job, output is
> > written in ~30 second chunks. This doesn't happen in all Slurm
> > environments, but if it happens in one, it seems to always happen. For
> > example, on my local development cluster, which is a single node on
> Ubuntu
> > 18, I don't experience this. On a large Centos 7 based cluster, I do.
> >
> > Below is a simple reproducible example:
> >
> > loop.sh:
> > #!/bin/bash
> > for i in {1..100}
> > do
> >    echo $i
> >    sleep 1
> > done
> >
> > withsteps.sh:
> > #!/bin/bash
> > srun ./loop.sh
> >
> > Then from the command line running sbatch loop.sh followed by tail -f
> > slurm-<job #>.out prints the job output in smaller chunks, which appears
> to
> > be related to file system buffering or the time it takes for the tail
> > process to notice that the file has updated. Running cat on the file
> every
> > second shows that the output is in the file immediately after it is
> emitted
> > by the script.
> >
> > If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing
> the
> > output file will show that the job output is written in a chunk of 30 -
> 35
> > lines.
> >
> > I'm hoping this is something that is possible to work around, potentially
> > related to an OS setting, the way Slurm was compiled, or a Slurm setting.
>
>
> --
> Research Fellow
> School of Computer Science
> University of Nottingham
>
>
>
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please contact the sender and delete the email and
> attachment.
>
> Any views or opinions expressed by the author of this email do not
> necessarily reflect the views of the University of Nottingham. Email
> communications with the University of Nottingham may be monitored
> where permitted by law.
>
>
>
>
>
>

-- 
Thanks,
Maria
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210210/ac524663/attachment.htm>


More information about the slurm-users mailing list