[slurm-users] Splitting mpi rank output
Christopher Benjamin Coffey
Chris.Coffey at nau.edu
Mon May 14 08:49:30 MDT 2018
Thanks Chris! :)
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 5/10/18, 12:42 AM, "slurm-users on behalf of Chris Samuel" <slurm-users-bounces at lists.schedmd.com on behalf of chris at csamuel.org> wrote:
On Thursday, 10 May 2018 2:25:49 AM AEST Christopher Benjamin Coffey wrote:
> I have a user trying to use %t to split the mpi rank outputs into different
> files and it's not working. I verified this too. Any idea why this might
> be? This is the first that I've heard of a user trying to do this.
I think they want to use that as an argument to srun, not sbatch.
I don't know why it doesn't work for sbatch, I'm guessing it doesn't get
passed on in the environment? From the look of the srun manual page it
probably should set SLURM_STDOUTMODE. But then you'd get both the batch
output and rank 0 going to the first one. Seems like a bug to me.
However, I can confirm that it works if you pass it to srun instead.
[csamuel at farnarkle1 tmp]$ cat test-rank.sh
#!/bin/bash
#SBATCH --ntasks=10
#SBATCH --ntasks-per-node=1
srun -o foo-%t.out hostname
[csamuel at farnarkle1 tmp]$ ls -ltr
total 264
-rw-rw-r-- 1 csamuel hpcadmin 89 May 10 17:34 test-rank.sh
-rw-rw-r-- 1 csamuel hpcadmin 0 May 10 17:34 slurm-127420.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-9.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-8.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-7.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-6.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-5.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-4.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-3.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-2.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-1.out
-rw-rw-r-- 1 csamuel hpcadmin 7 May 10 17:34 foo-0.out
[csamuel at farnarkle1 tmp]$ more foo-*
::::::::::::::
foo-0.out
::::::::::::::
john37
::::::::::::::
foo-1.out
::::::::::::::
john38
::::::::::::::
foo-2.out
::::::::::::::
john39
::::::::::::::
foo-3.out
::::::::::::::
john40
::::::::::::::
foo-4.out
::::::::::::::
john41
::::::::::::::
foo-5.out
::::::::::::::
john42
::::::::::::::
foo-6.out
::::::::::::::
john43
::::::::::::::
foo-7.out
::::::::::::::
john44
::::::::::::::
foo-8.out
::::::::::::::
john45
::::::::::::::
foo-9.out
::::::::::::::
john46
Hope that helps,
Chris
--
Chris Samuel : https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7C61855c9177454600b81608d5b6498836%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C1%7C636615349292244096&sdata=TgEuyUOTdtVKxFsrFWSQo4Y9qwAzXY9lVk3pq0E6VQ0%3D&reserved=0 : Melbourne, VIC
More information about the slurm-users
mailing list