Contact us
Back to

InfiniBand Simplified: Core Technology FAQs

Quinn
InfiniBand Network Architect · Oct 24, 20248060InfiniBand Networking

With the rise of big data and AI technologies, the demand for high-performance computing (HPC) continues to grow. To meet these demands, NVIDIA’s Quantum-2 InfiniBand platform delivers exceptional distributed computing performance, offering high-speed and low-latency data transfers and processing.

 

Below are common questions related to InfiniBand (IB) technology:

 

Q: Is the CX7 NDR200 QSFP112 port compatible with HDR/EDR cables?

A: Yes,the CX7 NDR200 QSFP112 port is fully compatible with HDR and EDR cables.

 

Q: How do I connect a CX7 NDR adapter to the Quantum-2 QM97XX series switch?

A: The CX7 NDR adapter uses NVIDIA 400GBASE-SR4 or 400GBASE-DR4 optical transceivers, while the QM97XX series switch uses 800GBASE-SR8 (equivalent to 2x400GBASE-SR4) or 800GBASE-DR8 (equivalent to 2x400GBASE-DR4) transceivers. Both transceivers are connected using 12-fiber multimode universal polarity APC-terminated cables.

 

Q: Can dual-port 400G on a CX7 adapter achieve 800G throughput via port bonding? Why can 200G achieve 400G via bonding?

A: Achieving 800G throughput on dual-port 400G is limited by factors like PCIe bandwidth, adapter processing capabilities, and physical port bandwidth. The CX7 HCA uses PCIe 5.0 x16, with a theoretical maximum bandwidth of 512Gbps, so it's not possible to achieve 800G. However, 200G ports can be bonded to achieve 400G throughput, within PCIe limits.

 

Q: How are breakout cables connected?

A: For optimal performance, breakout cables (800G to 2x400G) should connect to two separate servers.

 

Q: Can I connect breakout cables (800G to 2x400G) in InfiniBand NDR environments?

A: Yes, in InfiniBand NDR environments, you can use two types of breakout cables:

  • Optical transceivers with breakout capability (splitting 400G into 2x200G), such as MMS4X00-NS+ MFP7E20-NXXX + MMS4X00-NS400 (degraded to 200G).
  • Breakout high-speed cables (splitting 800G into 2x400G), such as MCP7Y00-NXXX or MCP7Y10-NXXX.

 

Q: In a SuperPOD network, can the four NDR200 cards on each server be directly connected to the same switch using a 1x4 cable, or should two 1x2 cables be connected to different switches?

A: In a SuperPOD network, it is not recommended to use a 1x4 breakout cable to directly connect all four NDR200 ports on a server to the same switch. This setup does not follow the SuperPOD network rules. To ensure optimal performance for NCCL/SHARP, a 1x4 cable should connect NDR200 ports from different servers to leaf switches in a specific pattern.

 

Q: According to the SuperPOD network whitepaper, two InfiniBand switches with UFM software need to be configured separately in the compute network. However, this reduces the GPU node count in the cluster. If I choose not to configure separate UFM switches and deploy UFM software only on the management node, can I manage the cluster through another storage network without affecting the compute network?

A: It is recommended to configure dedicated UFM devices with the software. Deploying UFM software on the management node in the compute network is an alternative, but this node should not handle GPU compute workloads. The storage network operates as an independent layer and should not be used to manage the compute cluster.

 

Q: What is the difference between Enterprise UFM, SDN, Telemetry, and Cyber-AI? Is it necessary to purchase UFM?

A: Simple management and monitoring can be achieved using OFED with opensm and command-line tools, but UFM offers a user-friendly graphical interface and additional features.

 

Q: Are there differences in the number of subnet managers required for switches, OFED, and UFM? Which one is better suited for customer deployment?

A: Switch-based management supports networks with up to 2,000 nodes. The UFM and openSM (part of OFED) have no specific limits, but their performance depends on the CPU and hardware capabilities of the management nodes.

 

Q: Why does a switch with 64 x 400Gb ports only have 32 OSFP ports?

A: This limitation is due to the size and power constraints of the 2U chassis, which can only accommodate 32 slots. This configuration is designed to support dual-port 400G OSFP interfaces. In NDR switches, there is a distinction between slots and ports.

 

Q: Can I use a cable to connect transceivers with different interfaces, such as connecting an OSFP port on a server to a QSFP112 port on a switch?

A: Yes, the physical form factor (OSFP vs. QSFP112) does not affect interoperability as long as the optical transmission rate, wavelength, and connector type are consistent on both ends.

 

Q: Can UFM monitor RoCE networks?

A: No, UFM only supports InfiniBand networks.

 

Q: Are UFM’s functions the same for managed and unmanaged switches?

A: Yes, the functionality is the same.

 

Q: What is the maximum transmission distance supported by InfiniBand cables? Does it affect bandwidth and latency?

A: Optical transceivers with fiber patch cables can reach distances of approximately 2 km. Passive high-speed direct attach cables (DAC) have a range of about 3 meters, while active copper cables (ACC) can extend up to 5 meters.

 

Q: Can the NVIDIA’s CX7 NIC connect to other 400G Ethernet switches that support RDMA in Ethernet mode?

A: It is possible to establish 400G Ethernet connections, and RoCE can operate in such environments, but performance is not guaranteed. For 400G Ethernet, it is recommended to use the Spectrum-X platform featuring BlueField3 and Spectrum-4.

 

Q: Is NDR compatible with HDR? Do these cables and transceivers come in a single specification?

A: Yes, OSFP-to-2xQSFP56 DAC/AOC cables are commonly used to ensure compatibility with HDR.

 

Q: Should the transceivers in OSFP ports on the HCA be flat transceivers?

A: The adapter is equipped with a heatsink, so flat transceivers can be used directly. Heatsink transceivers are primarily used on liquid-cooled switches.

 

Q: Why doesn’t NDR have AOC cables?

A: NDR does not have AOC cables due to the larger size and weight of OSFP transceivers, which makes the fiber more susceptible to damage. Cables with multiple branches increase the risk of fiber breakage, especially over longer distances (e.g., 30m AOCs).

 

Q: Apart from using different optical transceivers, are the cables for 400G InfiniBand and 400G Ethernet the same?

A: The cables are the same, but it's important to note that both use APC connectors angled at 8 degrees.

 

Q: Does the CX7 adapters have specific latency performance requirements? What are the acceptable latency values under an optimized environment, such as full memory usage and core binding?

A: Latency performance depends on the frequency and configuration of the test machines, as well as the testing tools used, such as perftest and MPI.

 

Q: What role does UFM play in this cluster solution?

A: UFM operates independently on a server and is treated as a node. It supports high availability using two servers, but it is not recommended to run UFM on nodes that handle compute workloads.

 

Q: What size network cluster is recommended for UFM configuration?

A: It is recommended to configure UFM for all InfiniBand networks, as UFM not only provides openSM but also offers other powerful management and interface functionalities.

 

Q: Does PCIe 5.0 only support up to 512G? What about PCIe 4.0?

A: PCIe Gen5 supports up to 32GT/s per lane with 16 lanes, delivering a maximum bandwidth of 512G. On the other hand, PCIe Gen4 supports up to 16GT/s per lane with 16 lanes, providing a maximum bandwidth of 256G.

 

Q: Do InfiniBand adapters support simplex or duplex modes?

A: All InfiniBand adapters operate in duplex mode. Simplex and duplex is only a conceptual distinction for current devices, as the physical channels for transmitting and receiving data are separate.

 

infiniband ndr 800g osfp transceiversNADDOD InfiniBand NDR 800G OSFP Transceivers

 

NADDOD provides advanced technical support and industry-leading products tailored for low-latency, lossless InfiniBand clusters and scalable, high-performance RoCE network clusters. With its innovative portfolio, including 51.2T Ethernet data center switches, 800G/400G high-speed optical interconnects, and immersion liquid cooling interconnects, NADDOD equips organizations with the essential tools to build robust, scalable infrastructures that meet the rigorous demands of generative AI, AI workloads, and HPC environments.

 

These solutions guarantee stable, high-reliability networks—particularly crucial for AI clusters and high-frequency trading, where zero downtime and low latency are essential for operational success and maintaining peak performance. Contact our experts at NADDOD to receive tailored recommendations that best meet your AI network needs.

We use cookies to ensure you get the best experience on our website. Continued use of this website indicates your acceptance of our cookie policy.