Blogs

Introduction to the Three Key Processing Cores Inside NVIDIA GPUs

Introduction to the Three Key Processing Cores Inside NVIDIA GPUs

Starting with the SM organizational structure of NVIDIA GPUs, this paper outlines the architectural positioning and capability focus of internal computing cores such as CUDA core, Tensor core, and RT core, and explains the differences in the division of labor and applicable scenarios of different processing cores in NVIDIA GPUs.
Abel
Jan 9, 2026
Inference Chip Guide: The Foundation of Scalable AI Applications

Inference Chip Guide: The Foundation of Scalable AI Applications

AI inference is becoming a core factor in computing power costs. This article systematically analyzes the key differences between AI training and inference, introduces the advantages of inference chips, and explains the inference roadmaps of Amazon Inferentia2, Google TPU Ironwood, and NVIDIA, helping you understand why inference chips have become a key infrastructure for the large-scale deployment of AI.
Adam
Jan 9, 2026
In-Depth Understanding of AI Distributed Training Communication Primitives

In-Depth Understanding of AI Distributed Training Communication Primitives

In-depth analysis of distributed training communication primitives to understand their role in large-scale AI training, and applying NCCL to illustrate how communication primitives affect the upper limit of distributed training performance.
Jason
Jan 7, 2026
Spectrum-6 Ethernet Switch Deep Dive: SN6810 102.4T and SN6800 409.6T Switch

Spectrum-6 Ethernet Switch Deep Dive: SN6810 102.4T and SN6800 409.6T Switch

In-depth analysis of the NVIDIA Spectrum-6 Ethernet switch: Supporting 102.4T single-chip bandwidth, CPO co-packaged optics, 224G SerDes, and a high-cardinality port design, suitable for scale-out and scale-across network deployments of large-scale GPU clusters.
Jason
Jan 7, 2026
NVIDIA Rubin Platform: AI Supercomputer with Six New Chips

NVIDIA Rubin Platform: AI Supercomputer with Six New Chips

The NVIDIA Rubin platform is a new computing platform for next-generation AI. Through the co-design of GPUs, CPUs, interconnects, and networks, it achieves high performance, low cost, and system-level security, supporting large-scale AI models and multi-agent applications.
Quinn
Jan 6, 2026
Introduction to Tensor Cores in NVIDIA GPUs

Introduction to Tensor Cores in NVIDIA GPUs

Focusing on the fundamental concepts, working principle, and evolutionary path of Tensor Cores within NVIDIA GPU microarchitectures, this article provides a systematic analysis of how Tensor Cores have become a critical foundation of modern AI computing infrastructure.
Gavin
Dec 31, 2025
Amazon 3nm AI Chip Trainium3 Deep Dive

Amazon 3nm AI Chip Trainium3 Deep Dive

Explore the architecture, memory bandwidth, and system scalability of Amazon's 3nm AI chip, Trainium 3. Compare it with NVIDIA GB300 and Google TPU Ironwood to analyze enterprise selection strategies and the future development direction of Trainium 4.
Jason
Dec 30, 2025
Broadcom Sian3 and Sian2M: 200G/lane optical interconnect DSP technologies for AI data centers

Broadcom Sian3 and Sian2M: 200G/lane optical interconnect DSP technologies for AI data centers

Analyzing Broadcom's Sian3 and Sian2M 200G/lane DSP technologies. Sian3 (3nm/SMF) and Sian2M (5nm/MMF) support 800G and 1.6T optical modules, meeting the high bandwidth, low power consumption, and energy-efficient interconnect requirements of AI data center clusters through optimized manufacturing processes and high integration.
Peter
Dec 26, 2025
800G OSFP vs QSFP-DD: How to Choose the Right Optical Transceiver Form Factor

800G OSFP vs QSFP-DD: How to Choose the Right Optical Transceiver Form Factor

This article analyzes 800G OSFP and QSFP-DD form factor's technical differences, application scenarios, and deployment considerations to help data centers and high-performance networks make informed 800G transceiver selection decisions.
Peter
Dec 26, 2025