[slurm-users] Splitting mpi rank output

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Mon May 14 08:49:30 MDT 2018


Thanks Chris! :)

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

On 5/10/18, 12:42 AM, "slurm-users on behalf of Chris Samuel" <slurm-users-bounces at lists.schedmd.com on behalf of chris at csamuel.org> wrote:

    On Thursday, 10 May 2018 2:25:49 AM AEST Christopher Benjamin Coffey wrote:
    
    > I have a user trying to use %t to split the mpi rank outputs into different
    > files and it's not working. I verified this too. Any idea why this might
    > be? This is the first that I've heard of a user trying to do this.
    
    I think they want to use that as an argument to srun, not sbatch.
    
    I don't know why it doesn't work for sbatch, I'm guessing it doesn't get 
    passed on in the environment?  From the look of the srun manual page it 
    probably should set SLURM_STDOUTMODE.  But then you'd get both the batch 
    output and rank 0 going to the first one.  Seems like a bug to me.
    
    However, I can confirm that it works if you pass it to srun instead.
    
    [csamuel at farnarkle1 tmp]$ cat test-rank.sh
    #!/bin/bash
    #SBATCH --ntasks=10
    #SBATCH --ntasks-per-node=1
    
    srun -o foo-%t.out hostname
    
    [csamuel at farnarkle1 tmp]$ ls -ltr
    total 264
    -rw-rw-r-- 1 csamuel hpcadmin 89 May 10 17:34 test-rank.sh
    -rw-rw-r-- 1 csamuel hpcadmin  0 May 10 17:34 slurm-127420.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-9.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-8.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-7.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-6.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-5.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-4.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-3.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-2.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-1.out
    -rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-0.out
    
    
    [csamuel at farnarkle1 tmp]$ more foo-*
    ::::::::::::::
    foo-0.out
    ::::::::::::::
    john37
    ::::::::::::::
    foo-1.out
    ::::::::::::::
    john38
    ::::::::::::::
    foo-2.out
    ::::::::::::::
    john39
    ::::::::::::::
    foo-3.out
    ::::::::::::::
    john40
    ::::::::::::::
    foo-4.out
    ::::::::::::::
    john41
    ::::::::::::::
    foo-5.out
    ::::::::::::::
    john42
    ::::::::::::::
    foo-6.out
    ::::::::::::::
    john43
    ::::::::::::::
    foo-7.out
    ::::::::::::::
    john44
    ::::::::::::::
    foo-8.out
    ::::::::::::::
    john45
    ::::::::::::::
    foo-9.out
    ::::::::::::::
    john46
    
    Hope that helps,
    Chris
    -- 
     Chris Samuel  :  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7C61855c9177454600b81608d5b6498836%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C1%7C636615349292244096&sdata=TgEuyUOTdtVKxFsrFWSQo4Y9qwAzXY9lVk3pq0E6VQ0%3D&reserved=0  :  Melbourne, VIC
    
    
    



More information about the slurm-users mailing list