[slurm-users] Question about networks and connectivity

Jeffrey T Frey frey at udel.edu
Mon Dec 9 14:08:08 UTC 2019


Open MPI matches available hardware in node(s) against its compiled-in capabilities.  Those capabilities are expressed as modular shared libraries (see e.g. $PREFIX/lib64/openmpi).  You can use environment variables or command-line flags to influence which modules get used for specific purposed.  For example, the Byte-Transfer Layer (BTL) module has openib, tcp, self, shared-memory (sm), vader implementations.  So long as your build of Open MPI knew about Infiniband and the runtime can see the hardware, Open MPI should rank that interface highest-performance and use it.



> On Dec 9, 2019, at 08:54 , Sysadmin CAOS <sysadmin.caos at uab.cat> wrote:
> 
> Hi mercan,
> 
> OK, I forgot to compile OpenMPI with Infiniband support... But I still have a doubt: SLURM scheduler assigns (offers) some nodes called "node0x" to my sbatch job because in my SLURM cluster nodes have been added with "node0x" name. My OpenMPI application has been (now) compiled with ibverbs support.. but how I tell to my application or to my SLURM sbatch submit script that my MPI program MUST use Infiniband network? If SLURM has assigned to me node01 and node02 (with IP address 192.168.11.1 and 192.168.11.2 in a gigabit network) and Infiniband is 192.168.13.x, who transform from "clus01" (192.168.12.1) and "clus02" (192.168.12.2) to "infi01" (192.168.13.1) and "infi02" (192.168.13.2).
> 
> This step still baffles me...
> 
> Sorry if my question is easy for you... but now I have been entered in a sea of doubts.
> 
> Thanks.
> 
> El 05/12/2019 a las 14:27, mercan escribió:
>> Hi;
>> 
>> Your mpi and NAMD use your second network because of your applications did not compiled for infiniband. There are many compiled NAMD versions. the verb and ibverb versions are for using infiniband. Also, when you compiling the mpi source, you should check configure script detect the infiniband network to use infiniband. And even while compiling the slurm too.
>> 
>> Regards;
>> 
>>  Ahmet M.
>> 
>> 
>> On 5.12.2019 15:07, sysadmin.caos wrote:
>>> Hello,
>>> 
>>> Really, I don't know if my question is for this mailing list... but I will explain my problem and, then, you could answer me whatever you think ;)
>>> 
>>> I manage a SLURM clusters composed by 3 networks:
>>> 
>>>   * a gigabit network used for NFS shares (192.168.11.X). In this
>>>     network, my nodes are "node01, node02..." in /etc/hosts.
>>>   * a gigabit network used by SLURM (all my nodes are added to SLURM
>>>     cluster using this network and the hostname assigned via /etc/host
>>>     to this second network). (192.168.12.X). In this network, my nodes
>>>     are "clus01, clus02..." in /etc/hosts.
>>>   * a Infiniband network (192.168.13.X). In this network, my nodes are
>>>     "infi01, infi02..." in /etc/hosts.
>>> 
>>> When I submit a MPI job, SLURM scheduler offers me "n" nodes called, for example, clus01 and clus02 and, there, my application runs perfectly using second network for SLURM connectivity and first network for NFS (and NIS) shares. By default, as SLURM connectivity is on second network, my nodelist contains nodes called "clus0x".
>>> 
>>> However, now, I'm getting a "new" problem. I want to use third network (Infiniband), but as SLURM offers me "clus0x" (second network), my MPI application runs OK but using second network. This problem also occurs, for example, using NAMD (Charmrun) application.
>>> 
>>> So, my questions are:
>>> 
>>>  1. is this SLURM configuration correct for using both networks?
>>>      1. If answer is "no", how do I configure SLURM for my purpose?
>>>      2. But if answer is "yes", how can I ensure connections in my
>>>         SLURM job are going in Infiniband?
>>> 
>>> Thanks a lot!!
>>> 
> 
> 




More information about the slurm-users mailing list