<div dir="ltr">The larger cluster is using NFS. I can see how that could be related to the difference of behaviours between the clusters.<div><br></div><div> The buffering behaviour is the same if I tail the file from the node running the job. The only thing that seems to change the behaviour is whether I use <font face="monospace">srun</font> to create a job step or not.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 10, 2021 at 4:09 PM Aaron Jackson <<a href="mailto:Aaron.Jackson@nottingham.ac.uk">Aaron.Jackson@nottingham.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is it being written to NFS? You say on your local dev cluster it's a<br>

single node. Is it also the login node as well as compute? In that case<br>

I guess there is no NFS. Larger cluster will be using some sort of<br>

shared storage, so whichever shared file system you are using likely has<br>

caching.<br>

<br>

If you are able to connect directly to the node which is running the<br>

job, you can try tailing from there. It'll likely update immediately if<br>

what I said above is the case.<br>

<br>

Cheers,<br>

Aaron<br>

<br>

<br>

On  9 February 2021 at 23:47 GMT, Maria Semple wrote:<br>

<br>

> Hello all,<br>

><br>

> I've noticed an odd behaviour with job steps in some Slurm environments.<br>

> When a script is launched directly as a job, the output is written to file<br>

> immediately. When the script is launched as a step in a job, output is<br>

> written in ~30 second chunks. This doesn't happen in all Slurm<br>

> environments, but if it happens in one, it seems to always happen. For<br>

> example, on my local development cluster, which is a single node on Ubuntu<br>

> 18, I don't experience this. On a large Centos 7 based cluster, I do.<br>

><br>

> Below is a simple reproducible example:<br>

><br>

> loop.sh:<br>

> #!/bin/bash<br>

> for i in {1..100}<br>

> do<br>

>    echo $i<br>

>    sleep 1<br>

> done<br>

><br>

> withsteps.sh:<br>

> #!/bin/bash<br>

> srun ./loop.sh<br>

><br>

> Then from the command line running sbatch loop.sh followed by tail -f<br>

> slurm-<job #>.out prints the job output in smaller chunks, which appears to<br>

> be related to file system buffering or the time it takes for the tail<br>

> process to notice that the file has updated. Running cat on the file every<br>

> second shows that the output is in the file immediately after it is emitted<br>

> by the script.<br>

><br>

> If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing the<br>

> output file will show that the job output is written in a chunk of 30 - 35<br>

> lines.<br>

><br>

> I'm hoping this is something that is possible to work around, potentially<br>

> related to an OS setting, the way Slurm was compiled, or a Slurm setting.<br>

<br>

<br>

-- <br>

Research Fellow<br>

School of Computer Science<br>

University of Nottingham<br>

<br>

<br>

<br>

This message and any attachment are intended solely for the addressee<br>

and may contain confidential information. If you have received this<br>

message in error, please contact the sender and delete the email and<br>

attachment. <br>

<br>

Any views or opinions expressed by the author of this email do not<br>

necessarily reflect the views of the University of Nottingham. Email<br>

communications with the University of Nottingham may be monitored <br>

where permitted by law.<br>

<br>

<br>

<br>

<br>

<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Thanks,<br></div>Maria<br></div></div>