Contact Us
Back to

4 Common Spectrum-X Product Solutions in Ethernet Networking

Jason
Data Center Architect · Nov 29, 202414900RoCE

Since the unveiling of xAI Colossus, the world’s largest AI supercomputing cluster, it draw widely attention to Ethernet networking. Unlike most supercomputers that utilize technologies such as InfiniBand, Ethernet was chosen for this project. Colossue was built using the GPU giant's Spectrum-X Ethernet fabric.

 

This choice is highly strategic. Ethernet, the protocol that underpins the Internet, offers superior scalability. Many enterprises are already deeply familiar with Ethernet, relying on it as a trusted networking standard. However, for a long time, they lacked a solution to adequately support the characteristics of AI workloads using the Ethernet protocol.

 

To solve this problem, here came the NVIDIA Spectrum-X. As the world’s first high-performance Ethernet fabric, it represent a significant leap forward, ensuring that Ethernet remains a robust and future-proof technology in an era of exponential data growth.

 

What is Spectrum-X?


Spectrum-X, the world's leading AI Ethernet total solution pioneered by NVIDIA, integrates the industry's cutting-edge hardware and software technologies. It overcomes traditional ethernet limitations in AI training and greatly enhances the responsiveness and processing power of AI applications, creating an efficient and reliable computing power foundation for users in the field of generative AI.

 

The Spectrum-X networking platform consists of the following components.


1. Spectrum-4 Ethernet Switches


The NVIDIA Spectrum Ethernet switch family includes a comprehensive switch and software portfolio spanning 1GbE to 800GbE. NVIDIA Spectrum switches are ideal for both building AI fabrics connecting NVIDIA GPUs and connecting end-to-end cloud data center networks.

 

spectrum 4 ethernet switches

 

2. BlueField-3 DPU/SuperNICs

The NVIDIA BlueField DPU ignites unprecedented innovation for modern data centers, delivering a broad range of advanced networking, storage, and security services for complex compute and AI workloads, virtual machines, containers, and bare-metal servers in enterprise and cloud.

SuperNIC is one of the key components of the Spectrum-X solution. It delivers up to 400GbE RoCE network connectivity between GPU servers and enables NVIDIA GPUDirect® RoCE for optimizing peak AI workload efficiency.

3. LinkX® cables and transceivers


The LinkX® cable and transceivers plays an integral connectivity role in the Spectrum-X solution. It specializes in 800G and 400G end-to-end high-speed connectivity and utilizes advanced 100G PAM4 technology. With full support for OSFP , QSFP112 and QSFP-DD MSA standards, it covers a wide range of optical modules from DAC and ACC to multimode and singlemode to meet different cabling requirements.

 

Why Spectrum-X?

 

Efficient Load Balancing


Through dynamic routing technology, Spectrum-X is able to monitor the physical bandwidth of each link and port egress congestion status in real-time and dynamically, and implement dynamic load distribution policies down to the finest detail of each network message to improve link balance and effective bandwidth utilization, jumping from 50% - 60% to more than 97% from the traditional level, eliminating long-tailed latency problems caused by “elephant streams” in AI applications and demonstrating significant performance advantages under the aggregate communication mode of AI training. This eliminates the long-tail latency problem caused by “elephant streaming” in AI applications, and demonstrates significant performance advantages in the aggregate communication mode of AI training.

 

Low latency and stable data transmission


The application of DDP technology solves the problem of chaotic reorganization of messages at the receiving end brought about by the packet-by-packet load balancing strategy. By marking messages, intelligently assigning messages to uplinks, and handling the arrival of messy messages, it ensures that messages are combined in the correct order to form a complete data stream, eliminating the long-tailed delay phenomenon caused by messy messages, and providing a more stable and efficient data transmission solution for high-performance computing applications such as AI training.

 

Excellent multi-tenant performance isolation


In the AI multi-tenant cloud environment, Spectrum-X utilizes the programmable congestion control function on the BlueField-3 hardware platform to accurately assess the congestion condition of the traffic path by using in-band telemetry information of RTT probe messages and intermediate switches, and conducts fine-grained congestion control based on different workloads to achieve the goal of performance isolation to ensure that Each tenant gets the best expected performance in the cloud and is not negatively affected by congestion of other tenants.

 

Spectrum-X Solution


Spectrum-X is the only Ethernet networking platform providing a seamless connection from switch to SuperNIC to GPU. This end-to-end integration ensures exceptional signal integrity and the lowest BER (Bit Error Rate), greatly reducing the power consumption of the AI Cloud. Leveraging advanced technology found in NVIDIA Hopper GPUs, the Spectrum-X switch and SuperNIC, Spectrum-X ensures optimal power efficiency and performance without compromise.

Based on the typical Spectrum-X network topology, NADDOD provides flexible optical connectivity solutions for Spectrum switch-to-switch and Spectrum switch-to-DPU/SuperNIC connections, covering optical modules, DAC, AOC and fiber optical cables, etc.

Spectrum-4 Ethernet switches are crucial components in NVIDIA Spectrum-X deployments, and available in two models, the SN5600 and SN5400. In this article, we will take the Spectrum SN5000 series switches as a starting point to specifically introduce the product pairing scenarios for the SN5600 and SN5400 switches in cluster applications.

 

SN5600


The SN5600 is an 800G 51.2Tb/s switch. It offers 64 OSFP ports of 800GbE in a dense 2U form factor. The SN5600 is ideal for NVIDIA Spectrum-X deployments and enables both standard leaf/spine designs with top-of-rack (ToR) switches as well as end-of-row (EoR) topologies.

SN5400


The SN5400 is a 400G 25.6Tb/s switch.It offers 64 ports of 400GbE in a 2U form factor and diverse quad small form factor double density (QSFP-DD) connectivity in combinations from 1 to 400GbE.

 

 

SN5600 Switch Application

 

1. SN5600 Switch to SN5600 Switch

 

The SN5600 offers 64 ports of 800GbE in a dense 2U form factor. You can choose the suitable OSFP transceiver and cable based on the connection distance between devices, cost, operation and maintenance cabling, etc.

 

Depending on the transmission distance between devices, 800G OSFP 2XSR4 (50m), 800G OSFP 2xDR4 (500m) or 800G OSFP 2xFR4 (2km) optical modules can be used with the corresponding patch cables to realize interconnection between switches. Among them, the use of 800G OSFP 2xFR4 module, in addition to solving the problem of long-distance equipment interconnection, and secondly, because of its use of LC/Duplex patch cords, can greatly save the cabling cost for customers. (Compared to MPO-12/APC patch cables)

 

Using 800G OSFP DAC/AEC direct copper cable can also realize the connection between NVIDIA SN5600 switch to NVIDIA SN5600 switch, passive DAC can transmit up to 3m, active AEC can transmit up to 7m, which can well satisfy the demand of customer's short-distance connection between equipments. The use of copper cable connection, in addition to cost savings for customers, and secondly, due to the copper cable itself has the advantages of low power consumption, signal transmission stability and reliability, widely used in AI/HPC/data center and other environments.

2. SN5600 Switch to 400G BlueField-3 DPU/SuperNIC


The SN5600 switch utilizes 800G OSFP ports, while the 400G BlueField-3 SuperNIC uses QSFP112 ports. NADDOD offers a wide range of optical modules with various packaging types to meet diverse connection needs.

For the interconnection of SN5600 and 400G BlueField-3 DPU/SuperNIC, based on the transmission distance, you can choose 800G OSFP 2xSR4 or 800G OSFP 2xDR4 module, with two 1:2 MPO-12/APC fiber patch cables (MMF/SMF), connecting 400G Q112 SR4 or 400G Q112 DR4 modules, to realize 800G to 2x400G connection.

In addition, 800G OSFPDAC/ACC can also be used for connection. 800G OSFP end is used in NVIDIA SN5600 switch, branch QSFP112 is used in NVIDIA 400G QSFP112 DPU/SuperNIC NIC to realize 800G to 2x400G rate connection. The DAC can transmit up to 3m and the ACC can transmit up to 5m. This combination provides an efficient, stable and cost-effective data transmission solution for the switch-to-server connection.

SN5400 Switch Application


1. SN5400 Switch to SN5400 Switch


For the connection between SN5400 switch and switch, NADDOD can provide 100m~120km transmission distance 400G QSFP-DD optical modules and QSFP-DD DACs and AOCs, and the following combinations are based on commonly used products.

2. SN5400 Switch to 2x200G BlueField-3 DPU

 

For the connection between SN5400 switch and server, the following connection is only for the switch and common DPU/NICs, and other connections methods will not be elaborated here.If you would like to learn more about the detailed solutions, please contact us at sales@naddod.com. Our technicians will assist you with any inquiries you may have.

NADDOD


NADDOD transceiver modules, AOCs, and DACs are professionally tested and fully compatible with the Spectrum-4 SN5000 series switch and BlueField-3 SuperNIC which guarantees unmatched interoperability, high reliability and seamless operation in complex RoCE applications.In the RS FEC(544,514) test, NADDOD products achieve pre-FEC BER of -9~-11, and near-zero post-FEC BER, which can meet the strict requirements for high performance and stable netwokring connection.

 

Over 1000 cases, our commitment to innovation and quality has been proven through numerous successful data center deployments, making us a trusted partner for AI cluster initiatives. Contact NADDOD team for more information and newtwoking solutions.

We use cookies to ensure you get the best experience on our website. Continued use of this website indicates your acceptance of our cookie policy.