[slurm-users] Job Step Output Delay
Tilman Schneider
tilman at csquare.ai
Wed Feb 10 15:18:34 UTC 2021
Hi Maria,
seem related to srun's behavior around -u ; from the official doc
*-u*, *--unbuffered* By default the connection between slurmstepd and the
user launched application is over a pipe. The stdio output written by the
application is buffered by the glibc until it is flushed or the output is
set as unbuffered. See setbuf <https://slurm.schedmd.com/setbuf.html>(3).
If this option is specified the tasks are executed with a pseudo terminal
so that the application output is unbuffered. This option applies to step
allocations.
Hth
Tilman
Message: 2
> Date: Tue, 9 Feb 2021 15:47:12 -0800
> From: Maria Semple <maria at rstudio.com>
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: [slurm-users] Job Step Output Delay
> Message-ID:
> <CAJON5fi+V6ok3TstSxJr=
> wj6+2rD2Yr736-+MpVaBY+puGYFcA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello all,
>
> I've noticed an odd behaviour with job steps in some Slurm environments.
> When a script is launched directly as a job, the output is written to file
> immediately. When the script is launched as a step in a job, output is
> written in ~30 second chunks. This doesn't happen in all Slurm
> environments, but if it happens in one, it seems to always happen. For
> example, on my local development cluster, which is a single node on Ubuntu
> 18, I don't experience this. On a large Centos 7 based cluster, I do.
>
> Below is a simple reproducible example:
>
> loop.sh:
> #!/bin/bash
> for i in {1..100}
> do
> echo $i
> sleep 1
> done
>
> withsteps.sh:
> #!/bin/bash
> srun ./loop.sh
>
> Then from the command line running sbatch loop.sh followed by tail -f
> slurm-<job #>.out prints the job output in smaller chunks, which appears to
> be related to file system buffering or the time it takes for the tail
> process to notice that the file has updated. Running cat on the file every
> second shows that the output is in the file immediately after it is emitted
> by the script.
>
> If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing the
> output file will show that the job output is written in a chunk of 30 - 35
> lines.
>
> I'm hoping this is something that is possible to work around, potentially
> related to an OS setting, the way Slurm was compiled, or a Slurm setting.
>
> --
> Thanks,
> Maria
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.schedmd.com/pipermail/slurm-users/attachments/20210209/3bffe170/attachment-0001.htm
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210210/2a835667/attachment.htm>
More information about the slurm-users
mailing list