[slurm-users] slurm SBATCH - Multiple Nodes, Same SLURMD_NODENAME

Sam doogly1 at gmail.com
Fri Jul 13 10:54:14 MDT 2018


StackOverflow Thread:
https://stackoverflow.com/questions/51328917/slurm-sbatch-multiple-nodes-same-slurmd-nodename

possibly related to:
https://groups.google.com/forum/#!topic/slurm-users/suclnO2V0aA

 - slurm-wlm 17.11.2
 - Installed from Ubuntu Apt repos, Ubuntu:18.04

We have a cluster of 20 identical nodes.
Running the simple script below give me a confusing problem.
All the jobs think they are running on node3, while running the hostname
command gives the accurate answer. This is also a problem for the output
filename. I expected to have many different outputs, but I get just one,
with 'node3' in the filename. This seems to be a Bash Eval() / Variable
substitution error.
Wrapping

    $SLURMD_NODENAME

in a

    bash -c "echo Bash3: \$SLURMD_NODENAME"

works. But why did I have to do this? This workaround won't work for the
#SBATCH --output

cn.job:

    #!/bin/bash
    #SBATCH --output=/share/output.txt.%j.%J.%a.%A.%n.%N.%s.%t.%x

    #SBATCH --time=00:00:30
    #SBATCH --tasks-per-node=2
    #SBATCH --nodes=4

    srun hostname
    srun bash -c "echo Bash2: \$(hostname)"
    srun echo SLURMD_NODENAME:$SLURMD_NODENAME
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID
SLURM_TASK_PID:$SLURM_TASK_PID
    srun bash -c "echo Bash3: \$SLURMD_NODENAME"

    srun sleep 20

Ran like:

    sbatch cn.job

produces this output:

**/share/output.txt.2056.2056.4294967294.2056.0.node3.4294967294.0.cn.job**

    node3
    node3
    node6
    node4
    node5
    node6
    node4
    node5
    Bash2: node3
    Bash2: node6
    Bash2: node4
    Bash2: node5
    Bash2: node3
    Bash2: node4
    Bash2: node6
    Bash2: node5
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
    Bash3: node3
    Bash3: node5
    Bash3: node3
    Bash3: node4
    Bash3: node6
    Bash3: node4
    Bash3: node6
    Bash3: node5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180713/12833c07/attachment-0001.html>


More information about the slurm-users mailing list