[slurm-users] slurm SBATCH - Multiple Nodes, Same SLURMD_NODENAME

Marcus Wagner wagner at itc.rwth-aachen.de
Mon Jul 16 01:12:29 MDT 2018


Hi Sam,

this is expected and how bash works.

Regarding the #SBATCH --output problem this seems to be an error, 
because only one output file is created (I just tested it myself).


Regarding variable substitution:

srun echo SLURMD_NODENAME:$SLURMD_NODENAME 
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID 
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID 
SLURM_TASK_PID:$SLURM_TASK_PID

bash evaluates the variables before the actual program is started, 
otherwise e.g. "cd $HOME" would not work, because in most unixoid 
systems $HOME never exist, but the variable HOME would point to the 
user's home directory.

So, in fact, here's what you're letting go:

srun echo SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
SLURM_JOB_ID:2056 SLURM_TASK_PID:644
This is exactly the output you received.


Here's what you could try:

srun echo 'SLURMD_NODENAME:$SLURMD_NODENAME 
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID 
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID 
SLURM_JOB_ID:$SLURM_JOB_ID_LURM_TASK_PID_TASK_PID'.

the single quotes (no backticks!) should prevent bash from replacing the 
variables.


Best
Marcus


On 07/13/2018 06:54 PM, Sam wrote:
>
> StackOverflow Thread: 
> https://stackoverflow.com/questions/51328917/slurm-sbatch-multiple-nodes-same-slurmd-nodename
>
>
> possibly related to:
> https://groups.google.com/forum/#!topic/slurm-users/suclnO2V0aA 
> <https://groups.google.com/forum/#%21topic/slurm-users/suclnO2V0aA>
>
>  - slurm-wlm 17.11.2
>  - Installed from Ubuntu Apt repos, Ubuntu:18.04
>
> We have a cluster of 20 identical nodes.
> Running the simple script below give me a confusing problem.
> All the jobs think they are running on node3, while running the 
> hostname command gives the accurate answer. This is also a problem for 
> the output filename. I expected to have many different outputs, but I 
> get just one, with 'node3' in the filename. This seems to be a Bash 
> Eval() / Variable substitution error.
> Wrapping
>
> $SLURMD_NODENAME
>
> in a
>
>   bash -c "echo Bash3: \$SLURMD_NODENAME"
>
> works. But why did I have to do this? This workaround won't work for 
> the #SBATCH --output
>
> cn.job:
>
>   #!/bin/bash
>   #SBATCH --output=/share/output.txt.%j.%J.%a.%A.%n.%N.%s.%t.%x
>   #SBATCH --time=00:00:30
>   #SBATCH --tasks-per-node=2
>   #SBATCH --nodes=4
>   srun hostname
>   srun bash -c "echo Bash2: \$(hostname)"
>   srun echo SLURMD_NODENAME:$SLURMD_NODENAME 
> SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID 
> SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID 
> SLURM_TASK_PID:$SLURM_TASK_PID
>   srun bash -c "echo Bash3: \$SLURMD_NODENAME"
>   srun sleep 20
>
> Ran like:
>
>   sbatch cn.job
>
> produces this output:
>
> **/share/output.txt.2056.2056.4294967294.2056.0.node3.4294967294.0.cn.job**
>
>   node3
>   node3
>   node6
>   node4
>   node5
>   node6
>   node4
>   node5
>   Bash2: node3
>   Bash2: node6
>   Bash2: node4
>   Bash2: node5
>   Bash2: node3
>   Bash2: node4
>   Bash2: node6
>   Bash2: node5
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: 
> SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
>   Bash3: node3
>   Bash3: node5
>   Bash3: node3
>   Bash3: node4
>   Bash3: node6
>   Bash3: node4
>   Bash3: node6
>   Bash3: node5
>

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180716/6be4a432/attachment.html>


More information about the slurm-users mailing list