[slurm-users] Generating OPA topology.conf

Jeffrey Frey frey at udel.edu
Wed Jun 13 14:35:09 MDT 2018

Intel's OPA doesn't include the old IB net discovery library/API; instead, they have their own library to enumerate nodes, links, etc.  I've started a rewrite of ye olde "ib2slurm" utility to make use of Intel's new enumeration library.



$ opa2slurm --help

  opa2slurm {options}

  [HFI selection]

    -N, --hfi-name <hfi_name>    use the named HFI (e.g. hfi1_0)
    -n, --hfi-num <#>            use the HFI by integer index (0 = first active)
    -P, --hfi-port <#>           use the given port number on the HFI
    -G, --port-guid <guid>       use the port with the given GUID
                                 (e.g. 0x00117500d9000140)

  -o, --output <path>            write output topology configuration
                                 to the file at the given path
  -C, --no-comments              do not emit comments in the generated
                                 topology configuration
  -R, --no-ranged-lists          do not produce ranged name lists a'la SLURM
  -L, --linkspeed                include LinkSpeed values for switches
  -r, --no-redundancy-removal    do not remove references to non-leaf switches from
                                 leaf switches
  -v, --verbose                  display additional information to stderr

  [version 0.1]

$ opa2slurm --no-comments --linkspeed
SwitchName=r02-opa-s1 Switches=r00-opa-l[0-1],r01-opa-l[0-1],r02-opa-l0 LinkSpeed=16
SwitchName=r00-opa-l1 Nodes=r00n[25-56],r00oss0 LinkSpeed=16
SwitchName=r02-opa-s4 Switches=r00-opa-l[0-1],r01-opa-l[0-1],r02-opa-l0 LinkSpeed=16
SwitchName=r02-opa-s3 Switches=r00-opa-l[0-1],r01-opa-l[0-1],r02-opa-l0 LinkSpeed=16
SwitchName=r00-opa-l0 Nodes=r00n[00-24],r00oss1 LinkSpeed=16
SwitchName=r02-opa-s2 Switches=r00-opa-l[0-1],r01-opa-l[0-1],r02-opa-l0 LinkSpeed=16
SwitchName=r02-opa-s5 Switches=r00-opa-l[0-1],r01-opa-l[0-1],r02-opa-l0 LinkSpeed=16
SwitchName=r02-opa-l0 Nodes=r02login[00-01],r02mds[0-1],r02mgmt[00-02],r02s[00-01] LinkSpeed=16
SwitchName=r01-opa-l0 Nodes=r01n[00-24],r01oss1 LinkSpeed=16
SwitchName=r02-opa-s6 Switches=r00-opa-l[0-1],r01-opa-l[0-1],r02-opa-l0 LinkSpeed=16
SwitchName=r01-opa-l1 Nodes=r01n[25-56],r01oss0 LinkSpeed=16

When querying for links between nodes using Intel's API, both directions are returned (e.g. LID 1 -> 2 and 2 -> 1).  The program currently looks for any non-leaf switches and removes references to them from leaf switches -- very simple.  The LinkSpeed values are the (semi-arbitrary) product of the API's link width and base link speed enumerations as reported for a switch (maximum value across all ports on the switch).

Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180613/0c5f1d98/attachment.html>

More information about the slurm-users mailing list