Good afternoon,
I know this question has been asked a million times, but what is the canonical way to convert the list of nodes for a job that is container in a Slurm variable, I use SLURM_JOB_NODELIST, to a host list appropriate for mpirun in OpenMPI (perhaps MPICH as well)?
Before anyone says, compile OpenMPI with Slurm, I can't change the Slurm installation.
I have a script that does the conversion on a single node, but when I try a cluster that does not include the single node, I get an error:
scontrol: error: host list is empty
The line in the script corresponding to this is,
list=$(scontrol show hostname $SLURM_NODELIST)
I've tried using the env variable SLURM_JOB_NODELIST and I get the same error message.
Thanks!
Jeff
As I recall I think OpenMPI needs a list that has an entry on each line, rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST holy7c[26401-26405] [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) [root@holy7c26401 ~]# echo $list holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
The first would be fine for OpenMPI (though usually you also need to have slots=numranks for each entry, where numranks is equal to the number of ranks per host you are trying to set up). The second I don't think would be interpreted properly. So you will need to make sure that things are passed in a manner that it can read. I usually just have it dump to file and then read in that file rather than holding it as a environmental variable.
-Paul Edmon-
On 8/9/2024 12:34 PM, Jeffrey Layton via slurm-users wrote:
Good afternoon,
I know this question has been asked a million times, but what is the canonical way to convert the list of nodes for a job that is container in a Slurm variable, I use SLURM_JOB_NODELIST, to a host list appropriate for mpirun in OpenMPI (perhaps MPICH as well)?
Before anyone says, compile OpenMPI with Slurm, I can't change the Slurm installation.
I have a script that does the conversion on a single node, but when I try a cluster that does not include the single node, I get an error:
scontrol: error: host list is empty
The line in the script corresponding to this is,
list=$(scontrol show hostname $SLURM_NODELIST)
I've tried using the env variable SLURM_JOB_NODELIST and I get the same error message.
Thanks!
Jeff
Hi Paul,
On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
As I recall I think OpenMPI needs a list that has an entry on each line, rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST holy7c[26401-26405] [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) [root@holy7c26401 ~]# echo $list holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
proper quoting does wonders here (please consult the man-page of bash). If you try
echo "$list"
you will see that you will get
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility.
Regards, Hermann
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test.
In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere.
Thanks!
Jeff
On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users < slurm-users@lists.schedmd.com> wrote:
Hi Paul,
On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
As I recall I think OpenMPI needs a list that has an entry on each line, rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST holy7c[26401-26405] [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) [root@holy7c26401 ~]# echo $list holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
proper quoting does wonders here (please consult the man-page of bash). If you try
echo "$list"
you will see that you will get
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility.
Regards, Hermann
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Normally MPI will just pick up the host list from Slurm itself. You just need to build MPI against Slurm and it will just grab it. Typically this is transparent to the user. Normally you shouldn't need to pass a host list at all. See: https://slurm.schedmd.com/mpi_guide.html
The canonical way to do it if you need to would be the scontrol show hostnames command against the $SLURM_JOB_NODELIST (https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give you the list of hosts your job is set to run on.
-Paul Edmon-
On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test.
In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere.
Thanks!
Jeff
On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Paul, On 8/9/24 18:45, Paul Edmon via slurm-users wrote: > As I recall I think OpenMPI needs a list that has an entry on each line, > rather than one seperated by a space. See: > > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST > holy7c[26401-26405] > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST > holy7c26401 > holy7c26402 > holy7c26403 > holy7c26404 > holy7c26405 > > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) > [root@holy7c26401 ~]# echo $list > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 proper quoting does wonders here (please consult the man-page of bash). If you try echo "$list" you will see that you will get holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility. Regards, Hermann -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Paul,
I tend not to rely on the MPI being built with Slurm :) I find that the systems I use haven't done that. :( I'm not exactly sure why, but that is the way it is :)
Up to now, using scontrol has always worked for me. However, a new system is not cooperating (it is running on the submittal host and not the compute nodes) and I'm trying to debug it. My first step was to check that the job was getting the compute nodes names (the list of nodes from Slurm is empty). This led to my question about the "canonical" way to get the hostlist (I'm checking using the hostlist and just relying on Slurm being integrated into the mpi - both don't work since the hostlist is empty).
It looks like there is a canonical way to do it as you mentioned. FAQ worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy for Slurm docs :)
Thanks everyone for your help!
Jeff
On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users < slurm-users@lists.schedmd.com> wrote:
Normally MPI will just pick up the host list from Slurm itself. You just need to build MPI against Slurm and it will just grab it. Typically this is transparent to the user. Normally you shouldn't need to pass a host list at all. See: https://slurm.schedmd.com/mpi_guide.html
The canonical way to do it if you need to would be the scontrol show hostnames command against the $SLURM_JOB_NODELIST ( https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give you the list of hosts your job is set to run on.
-Paul Edmon- On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test.
In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere.
Thanks!
Jeff
On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users < slurm-users@lists.schedmd.com> wrote:
Hi Paul,
On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
As I recall I think OpenMPI needs a list that has an entry on each
line,
rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST holy7c[26401-26405] [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) [root@holy7c26401 ~]# echo $list holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
proper quoting does wonders here (please consult the man-page of bash). If you try
echo "$list"
you will see that you will get
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility.
Regards, Hermann
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Certainly a strange setup. I would probably talk with who ever is providing MPI for you and ask them to build it against Slurm properly. As in order to get correct process binding you definitely want to have it integrated properly with slurm either via PMI2 or PMIx. If you just use the bare hostlist, your ranks may not end up properly bound to the specific cores they are supposed to be allocated. So definitely proceed with caution and validate your ranks are being laid out properly, as you will be relying on mpirun/mpiexec to bootstrap rather than the scheduler.
-Paul Edmon-
On 8/12/2024 9:55 AM, Jeffrey Layton wrote:
Paul,
I tend not to rely on the MPI being built with Slurm :) I find that the systems I use haven't done that. :( I'm not exactly sure why, but that is the way it is :)
Up to now, using scontrol has always worked for me. However, a new system is not cooperating (it is running on the submittal host and not the compute nodes) and I'm trying to debug it. My first step was to check that the job was getting the compute nodes names (the list of nodes from Slurm is empty). This led to my question about the "canonical" way to get the hostlist (I'm checking using the hostlist and just relying on Slurm being integrated into the mpi - both don't work since the hostlist is empty).
It looks like there is a canonical way to do it as you mentioned. FAQ worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy for Slurm docs :)
Thanks everyone for your help!
Jeff
On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
Normally MPI will just pick up the host list from Slurm itself. You just need to build MPI against Slurm and it will just grab it. Typically this is transparent to the user. Normally you shouldn't need to pass a host list at all. See: https://slurm.schedmd.com/mpi_guide.html The canonical way to do it if you need to would be the scontrol show hostnames command against the $SLURM_JOB_NODELIST (https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give you the list of hosts your job is set to run on. -Paul Edmon- On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test. In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere. Thanks! Jeff On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users <slurm-users@lists.schedmd.com> wrote: Hi Paul, On 8/9/24 18:45, Paul Edmon via slurm-users wrote: > As I recall I think OpenMPI needs a list that has an entry on each line, > rather than one seperated by a space. See: > > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST > holy7c[26401-26405] > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST > holy7c26401 > holy7c26402 > holy7c26403 > holy7c26404 > holy7c26405 > > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) > [root@holy7c26401 ~]# echo $list > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 proper quoting does wonders here (please consult the man-page of bash). If you try echo "$list" you will see that you will get holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility. Regards, Hermann -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
It's in a container. Specifically horovod/horovod on the Docker hub. I'm going into the container to investigate now (I think I have a link to the dockerfile as well).
Thanks!
Jeff
On Mon, Aug 12, 2024 at 10:01 AM Paul Edmon pedmon@cfa.harvard.edu wrote:
Certainly a strange setup. I would probably talk with who ever is providing MPI for you and ask them to build it against Slurm properly. As in order to get correct process binding you definitely want to have it integrated properly with slurm either via PMI2 or PMIx. If you just use the bare hostlist, your ranks may not end up properly bound to the specific cores they are supposed to be allocated. So definitely proceed with caution and validate your ranks are being laid out properly, as you will be relying on mpirun/mpiexec to bootstrap rather than the scheduler.
-Paul Edmon- On 8/12/2024 9:55 AM, Jeffrey Layton wrote:
Paul,
I tend not to rely on the MPI being built with Slurm :) I find that the systems I use haven't done that. :( I'm not exactly sure why, but that is the way it is :)
Up to now, using scontrol has always worked for me. However, a new system is not cooperating (it is running on the submittal host and not the compute nodes) and I'm trying to debug it. My first step was to check that the job was getting the compute nodes names (the list of nodes from Slurm is empty). This led to my question about the "canonical" way to get the hostlist (I'm checking using the hostlist and just relying on Slurm being integrated into the mpi - both don't work since the hostlist is empty).
It looks like there is a canonical way to do it as you mentioned. FAQ worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy for Slurm docs :)
Thanks everyone for your help!
Jeff
On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users < slurm-users@lists.schedmd.com> wrote:
Normally MPI will just pick up the host list from Slurm itself. You just need to build MPI against Slurm and it will just grab it. Typically this is transparent to the user. Normally you shouldn't need to pass a host list at all. See: https://slurm.schedmd.com/mpi_guide.html
The canonical way to do it if you need to would be the scontrol show hostnames command against the $SLURM_JOB_NODELIST ( https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give you the list of hosts your job is set to run on.
-Paul Edmon- On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test.
In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere.
Thanks!
Jeff
On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users < slurm-users@lists.schedmd.com> wrote:
Hi Paul,
On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
As I recall I think OpenMPI needs a list that has an entry on each
line,
rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST holy7c[26401-26405] [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) [root@holy7c26401 ~]# echo $list holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
proper quoting does wonders here (please consult the man-page of bash). If you try
echo "$list"
you will see that you will get
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility.
Regards, Hermann
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Ah, that's even more fun. I know with Singularity you can launch MPI applications by calling MPI outside of the container and then having it link to the internal version: https://docs.sylabs.io/guides/3.3/user-guide/mpi.html%C2%A0 Not sure about docker though.
-Paul Edmon-
On 8/12/2024 10:30 AM, Jeffrey Layton wrote:
It's in a container. Specifically horovod/horovod on the Docker hub. I'm going into the container to investigate now (I think I have a link to the dockerfile as well).
Thanks!
Jeff
On Mon, Aug 12, 2024 at 10:01 AM Paul Edmon pedmon@cfa.harvard.edu wrote:
Certainly a strange setup. I would probably talk with who ever is providing MPI for you and ask them to build it against Slurm properly. As in order to get correct process binding you definitely want to have it integrated properly with slurm either via PMI2 or PMIx. If you just use the bare hostlist, your ranks may not end up properly bound to the specific cores they are supposed to be allocated. So definitely proceed with caution and validate your ranks are being laid out properly, as you will be relying on mpirun/mpiexec to bootstrap rather than the scheduler. -Paul Edmon- On 8/12/2024 9:55 AM, Jeffrey Layton wrote:
Paul, I tend not to rely on the MPI being built with Slurm :) I find that the systems I use haven't done that. :( I'm not exactly sure why, but that is the way it is :) Up to now, using scontrol has always worked for me. However, a new system is not cooperating (it is running on the submittal host and not the compute nodes) and I'm trying to debug it. My first step was to check that the job was getting the compute nodes names (the list of nodes from Slurm is empty). This led to my question about the "canonical" way to get the hostlist (I'm checking using the hostlist and just relying on Slurm being integrated into the mpi - both don't work since the hostlist is empty). It looks like there is a canonical way to do it as you mentioned. FAQ worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy for Slurm docs :) Thanks everyone for your help! Jeff On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users <slurm-users@lists.schedmd.com> wrote: Normally MPI will just pick up the host list from Slurm itself. You just need to build MPI against Slurm and it will just grab it. Typically this is transparent to the user. Normally you shouldn't need to pass a host list at all. See: https://slurm.schedmd.com/mpi_guide.html The canonical way to do it if you need to would be the scontrol show hostnames command against the $SLURM_JOB_NODELIST (https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give you the list of hosts your job is set to run on. -Paul Edmon- On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test. In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere. Thanks! Jeff On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users <slurm-users@lists.schedmd.com> wrote: Hi Paul, On 8/9/24 18:45, Paul Edmon via slurm-users wrote: > As I recall I think OpenMPI needs a list that has an entry on each line, > rather than one seperated by a space. See: > > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST > holy7c[26401-26405] > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST > holy7c26401 > holy7c26402 > holy7c26403 > holy7c26404 > holy7c26405 > > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) > [root@holy7c26401 ~]# echo $list > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 proper quoting does wonders here (please consult the man-page of bash). If you try echo "$list" you will see that you will get holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 So you *can* pass this around in a variable if you use "$variable" whenever you provide it to a utility. Regards, Hermann -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com