As I recall I think OpenMPI needs a list that has an entry on each line, rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST holy7c[26401-26405] [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) [root@holy7c26401 ~]# echo $list holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
The first would be fine for OpenMPI (though usually you also need to have slots=numranks for each entry, where numranks is equal to the number of ranks per host you are trying to set up). The second I don't think would be interpreted properly. So you will need to make sure that things are passed in a manner that it can read. I usually just have it dump to file and then read in that file rather than holding it as a environmental variable.
-Paul Edmon-
On 8/9/2024 12:34 PM, Jeffrey Layton via slurm-users wrote:
Good afternoon,
I know this question has been asked a million times, but what is the canonical way to convert the list of nodes for a job that is container in a Slurm variable, I use SLURM_JOB_NODELIST, to a host list appropriate for mpirun in OpenMPI (perhaps MPICH as well)?
Before anyone says, compile OpenMPI with Slurm, I can't change the Slurm installation.
I have a script that does the conversion on a single node, but when I try a cluster that does not include the single node, I get an error:
scontrol: error: host list is empty
The line in the script corresponding to this is,
list=$(scontrol show hostname $SLURM_NODELIST)
I've tried using the env variable SLURM_JOB_NODELIST and I get the same error message.
Thanks!
Jeff