As I recall I think OpenMPI needs a list that has an entry on each line, rather than one seperated by a space. See:

[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
holy7c[26401-26405]
[root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
[root@holy7c26401 ~]# echo $list
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405

The first would be fine for OpenMPI (though usually you also need to have slots=numranks for each entry, where numranks is equal to the number of ranks per host you are trying to set up). The second I don't think would be interpreted properly. So you will need to make sure that things are passed in a manner that it can read. I usually just have it dump to file and then read in that file rather than holding it as a environmental variable.

-Paul Edmon-

On 8/9/2024 12:34 PM, Jeffrey Layton via slurm-users wrote:
Good afternoon,

I know this question has been asked a million times, but what is the canonical way to convert the list of nodes for a job that is container in a Slurm variable, I use SLURM_JOB_NODELIST, to a host list appropriate for mpirun in OpenMPI (perhaps MPICH as well)?

Before anyone says, compile OpenMPI with Slurm, I can't change the Slurm installation.

I have a script that does the conversion on a single node, but when I try a cluster that does not include the single node, I get an error:

scontrol: error: host list is empty

The line in the script corresponding to this is,

list=$(scontrol show hostname $SLURM_NODELIST)

I've tried using the env variable SLURM_JOB_NODELIST and I get the same error message.

Thanks!

Jeff