[slurm-users] Generating OPA topology.conf

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Jun 15 00:51:02 MDT 2018


Hi Markus,

I had to install additionally this RPM from the Intel OPA software suite 
(I disable the CentOS 7.4 RPMs):

IntelOPA-Tools.RHEL74-x86_64.10.6.1.0.2/RPMS/x86_64/opa-opa-libopamgt-devel-10.6.1.0-2.el7.x86_64.rpm

I also had to edit the CMakeLists.txt to replace SLURM_PREFIX /usr/local 
by /usr because we install the default Slurm RPMs from SchedMD.

Now opa2slurm builds without errors and works very nicely:

ccmake .
make

$ ./opa2slurm  | grep '^Switch'
SwitchName=opa-L2 Nodes=niflfs[1-3],niflmgt,niflopt1,x[025-048]
SwitchName=opa-c1 Switches=opa-L[1-8]
SwitchName=opa-L1 Nodes=x[001-024]
SwitchName=opa-L3 Nodes=x[049-072]
SwitchName=opa-L4 Nodes=x[073-096]
SwitchName=opa-L6 Nodes=x[121-144]
SwitchName=opa-L5 Nodes=x[097-120]
SwitchName=opa-c2 Switches=opa-L[1-8]
SwitchName=opa-c3 Switches=opa-L[1-8]
SwitchName=opa-L7 Nodes=x[169-192]
SwitchName=opa-L8 Nodes=x[145-168]

Jeffrey: It would be very nice if you could document in detail how to 
configure opa2slurm and list all prerequisite RPMs in your README.md.

Thanks a lot,
Ole


On 06/14/2018 10:07 AM, Marcus Wagner wrote:
> Hi Ole, hi Jeffrey.
> 
> Thanks Jeffrey for that tool, for me it is working. I changed a little 
> bit the CMakeLists.txt such that slurm can be found also in non standard 
> install paths ;)
> 
> replaced
> SET (SLURM_PREFIX "/usr/local" CACHE PATH "Directory in which SLURM is 
> installed.")
> with
> FIND_PATH(SLURM_PREFIX NAMES sbatch)
> 
> and changed
> FIND_PATH(SLURM_INCLUDE_DIR NAMES slurm/slurm.h)
> FIND_LIBRARY(SLURM_LIBRARY NAMES libslurm.so)
> to
> FIND_PATH(SLURM_INCLUDE_DIR NAMES slurm/slurm.h HINTS 
> ${SLURM_PREFIX}/../include)
> FIND_LIBRARY(SLURM_LIBRARY NAMES libslurm.so HINTS 
> ${SLURM_PREFIX}/../lib64)
> 
> @Ole,
> 
> I also got first compile errors:
> 
> CMakeFiles/opa2slurm.dir/opa2slurm.c.o: In function `main':
> opa2slurm.c:(.text+0x15ee): undefined reference to `omgt_status_totext'
> opa2slurm.c:(.text+0x16df): undefined reference to `omgt_status_totext'
> 
> I used the libopamgt coming with CentOS 7.5 (10.5.0.0.140-2.el7), after 
> changing this to version 10.6.1.0-2.el7 everything went through:
> 
> $> cmake . && make
> -- The C compiler identification is GNU 4.8.5
> -- Check for working C compiler: /usr/bin/gcc
> -- Check for working C compiler: /usr/bin/gcc -- works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Found SLURM: /opt/slurm/lib64/libslurm.so
> -- Found OPA: /usr/lib/libopamgt.so
> -- Configuring done
> -- Generating done
> -- Build files have been written to: /home/mw445520/git/opa2slurm
> Scanning dependencies of target opa2slurm
> [ 25%] Building C object CMakeFiles/opa2slurm.dir/ptr_list.c.o
> [ 50%] Building C object CMakeFiles/opa2slurm.dir/opa_host.c.o
> [ 75%] Building C object CMakeFiles/opa2slurm.dir/opa_switch.c.o
> [100%] Building C object CMakeFiles/opa2slurm.dir/opa2slurm.c.o
> Linking C executable opa2slurm
> [100%] Built target opa2slurm
> 
> the only additional prerequisites for me have been the libopamgt and 
> libopamgt-devel packages.
> 
> So, since you seem to use the same version as me, I'm not sure why you 
> have these linking problems :/
> 
> 
> Best
> Marcus
> 
> On 06/14/2018 09:17 AM, Ole Holm Nielsen wrote:
>> Hi Jeffrey,
>>
>> On 06/13/2018 10:35 PM, Jeffrey Frey wrote:
>>> Intel's OPA doesn't include the old IB net discovery library/API; 
>>> instead, they have their own library to enumerate nodes, links, etc. 
>>>   I've started a rewrite of ye olde "ib2slurm" utility to make use of 
>>> Intel's new enumeration library.
>>>
>>>
>>> https://gitlab.com/jtfrey/opa2slurm
>>
>> Thanks for sharing this tool!  We have an OPA fabric, and I'd like to 
>> try out your tool to generate topology.conf.  Unfortunately I'm unable 
>> to build opa2slurm because of missing headers and libraries:
>>
>> 1. The RPM package opa-libopamgt-devel is required for header files. 
>> This RPM may or may not be installed from the 
>> IntelOPA-Basic.RHEL74-x86_64.10.6.1.0.2.tgz tar-ball.  I had a node 
>> where the OPA FM software was installed, including the required header 
>> files.
>>
>> 2. After the CMake configuration the make fails:
>>
>> $ make
>> Scanning dependencies of target opa2slurm
>> [ 20%] Building C object CMakeFiles/opa2slurm.dir/ptr_list.c.o
>> [ 40%] Building C object CMakeFiles/opa2slurm.dir/opa_host.c.o
>> [ 60%] Building C object CMakeFiles/opa2slurm.dir/opa_switch.c.o
>> [ 80%] Building C object CMakeFiles/opa2slurm.dir/opa2slurm.c.o
>> [100%] Linking C executable opa2slurm
>> CMakeFiles/opa2slurm.dir/opa2slurm.c.o: In function `opa_link_speed':
>> opa2slurm.c:(.text+0xe7): undefined reference to 
>> `omgt_sa_get_portinfo_records'
>> opa2slurm.c:(.text+0x1ae): undefined reference to `omgt_sa_free_records'
>> CMakeFiles/opa2slurm.dir/opa2slurm.c.o: In function `main':
>> opa2slurm.c:(.text+0x734): undefined reference to `omgt_open_port'
>> opa2slurm.c:(.text+0x791): undefined reference to `omgt_open_port_by_num'
>> opa2slurm.c:(.text+0x7e1): undefined reference to 
>> `omgt_open_port_by_guid'
>> opa2slurm.c:(.text+0x88a): undefined reference to 
>> `omgt_sa_get_node_records'
>> opa2slurm.c:(.text+0xa12): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0xa5d): undefined reference to 
>> `omgt_sa_get_node_records'
>> opa2slurm.c:(.text+0xbd8): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0xc19): undefined reference to 
>> `omgt_sa_get_link_records'
>> opa2slurm.c:(.text+0xee5): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x15e9): undefined reference to `omgt_status_totext'
>> opa2slurm.c:(.text+0x1623): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x163e): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x1659): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x169a): undefined reference to `omgt_close_port'
>> opa2slurm.c:(.text+0x16d6): undefined reference to `omgt_status_totext'
>> opa2slurm.c:(.text+0x1717): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x1732): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x174d): undefined reference to `omgt_sa_free_records'
>> opa2slurm.c:(.text+0x178e): undefined reference to `omgt_close_port'
>> collect2: error: ld returned 1 exit status
>> make[2]: *** [opa2slurm] Error 1
>> make[1]: *** [CMakeFiles/opa2slurm.dir/all] Error 2
>> make: *** [all] Error 2
>>
>> Could you kindly add detailed building instructions including software 
>> prerequisites to your Gitlab page?
>>
>> Thanks a lot,
>> Ole
>>
> 

-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620



More information about the slurm-users mailing list