[slurm-users] slurm SBATCH - Multiple Nodes, Same SLURMD_NODENAME
Sam
doogly1 at gmail.com
Fri Jul 13 10:54:14 MDT 2018
StackOverflow Thread:
https://stackoverflow.com/questions/51328917/slurm-sbatch-multiple-nodes-same-slurmd-nodename
possibly related to:
https://groups.google.com/forum/#!topic/slurm-users/suclnO2V0aA
- slurm-wlm 17.11.2
- Installed from Ubuntu Apt repos, Ubuntu:18.04
We have a cluster of 20 identical nodes.
Running the simple script below give me a confusing problem.
All the jobs think they are running on node3, while running the hostname
command gives the accurate answer. This is also a problem for the output
filename. I expected to have many different outputs, but I get just one,
with 'node3' in the filename. This seems to be a Bash Eval() / Variable
substitution error.
Wrapping
$SLURMD_NODENAME
in a
bash -c "echo Bash3: \$SLURMD_NODENAME"
works. But why did I have to do this? This workaround won't work for the
#SBATCH --output
cn.job:
#!/bin/bash
#SBATCH --output=/share/output.txt.%j.%J.%a.%A.%n.%N.%s.%t.%x
#SBATCH --time=00:00:30
#SBATCH --tasks-per-node=2
#SBATCH --nodes=4
srun hostname
srun bash -c "echo Bash2: \$(hostname)"
srun echo SLURMD_NODENAME:$SLURMD_NODENAME
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID
SLURM_TASK_PID:$SLURM_TASK_PID
srun bash -c "echo Bash3: \$SLURMD_NODENAME"
srun sleep 20
Ran like:
sbatch cn.job
produces this output:
**/share/output.txt.2056.2056.4294967294.2056.0.node3.4294967294.0.cn.job**
node3
node3
node6
node4
node5
node6
node4
node5
Bash2: node3
Bash2: node6
Bash2: node4
Bash2: node5
Bash2: node3
Bash2: node4
Bash2: node6
Bash2: node5
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
Bash3: node3
Bash3: node5
Bash3: node3
Bash3: node4
Bash3: node6
Bash3: node4
Bash3: node6
Bash3: node5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180713/12833c07/attachment-0001.html>
More information about the slurm-users
mailing list