In the data center market of 2024, NVIDIA GPUs remain in high demand with persistent supply constraints. Since launching the A100 in 2020, NVIDIA has iterated its product line with the H100 in 2022, followed by the L40S in 2023 and the H200 in 2024.
The following table summarizes key parameters these products.
Project | A100 | H100 | L40S | H200 |
---|---|---|---|---|
Architecture | Ampere | Hopper | Ada Lovelace | Hopper |
Release Time | 2020 | 2022 | 2023 | 2024 |
FP64 | 9.7 TFLOPS | 34 TFLOPS | None | 34 TFLOPS |
FP64 Tensor Core | 19.5 TFLOPS | 67 TFLOPS | None | 67 TFLOPS |
FP32 | 19.5 TFLOPS | 67 TFLOPS | 91.6 TFLOPS | 67 TFLOPS |
FP32 Tensor Core | 312 TFLOPS | 989 TFLOPS | 183 | 366* TFLOPS | 989 TFLOPS* |
BFLOAT16 Tensor Core | 624 TFLOPS | 1979 TFLOPS | 362.05 | 733* TFLOPS | 1979 TFLOPS* |
FP16 Tensor Core | 624 TFLOPS | 1979 TFLOPS | 362.05 | 733* TFLOPS | 1979 TFLOPS* |
FP8 Tensor Core | Not applicable | 3958 TFLOPS | 733 | 1466* TFLOPS | 3958 TFLOPS |
INT8 Tensor Core | 1248 TOPS | 3958 TOPS | 733 | 1466* TFLOPS | 3958 TFLOPS |
INT4 Tensor Core | None | None | 733 | 1466* TFLOPS | Data not available |
GPU Memory | 80 GB HBM2e | 80 GB | 48GB GDDR6,With ECC | 141 GB HBM3e |
GPU Memory Bandwidth | 2039 Gbps | 3.35 Tbps | 864 Gbps | 4.8 Tbps |
Decoder | Not applicable | 7 NVDEC 7 JPEG | Not applicable | 7 NVDEC 7 JPEG |
Highest TDP | 400W | 700W | 350W | 700W |
Multi-instance GPU | Up to 7 MIGs @ 10 GB | Up to 7 MIGs @ 10 GB | None | Up to 7 MIGs @ 16.5 GB |
Dimensions | SXM | SXM | 4.4 " (H) x 10.5”(L), dual slot | SXM** |
Internet technology |
NVLink: 600 GB/s
|
NVLink: 900 GB/s
|
PCle Gen4 x16: 64GB/s bidi rectional |
NVIDIA NVLink®: 900GB/s
PCle Gen5: 128GB/s |
Server Platform Options | NVIDIA HGX"" A100 and NVlDlA-Certified Systms with 4,8,or 16 GPUS NVIDIA DGX™ A100 with 8 GPU5 | NVIDIA HGX H100 and NVIDlA-Certified Systems"m with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUs | None | NVIDIA HGX™ H200 and NVIDlA-Certified Systemsm with 4 or 8 GPUs |
NVIDIA AI Enterprise | Included | Add-on | None | Add-on |
Number of CUDA cores | 6912 | 16986 | 18176 | None |
A100
The A100, introduced in 2020, marked the first utilization of the Ampere architecture for GPUs, bringing significant performance improvements.
Before the release of the H100, the A100 outshined all other GPUs. Its performance gains were attributed to enhanced Tensor cores, increased CUDA core count, improved memory, and the fastest 2 Tbps memory bandwidth.
The A100 supports Multi-Instance GPU (MIG) functionality, which allows a single A100 GPU to be partitioned into multiple independent smaller GPUs, greatly improving resource allocation efficiency in cloud and data center environments.
Although surpassed by newer models, the A100 remains an excellent choice for training complex neural networks, deep learning, and AI workloads, thanks to its Tensor cores and high throughput, which deliver outstanding performance in these domains.
The A100 excels in AI inference tasks and demonstrates advantages in various applications such as speech recognition, image classification, recommendation systems, data analysis, big data processing, scientific computing, as well as high-performance computing scenarios like genomic sequencing and drug discovery.
H100
The H100 is capable of handling the most challenging AI workloads and large-scale data processing tasks.
The H100 features upgraded Tensor cores, resulting in a significant improvement in the speed of AI training and inference. It supports computations in double precision (FP64), single precision (FP32), half precision (FP16), and integer (INT8) formats.
Compared to the A100, the H100 offers a six-fold increase in FP8 computation speed, achieving up to 4 petaflops. It also boasts a 50% increase in memory capacity, utilizing HBM3 high-bandwidth memory with a bandwidth of up to 3 Tbps. The external connectivity speed is nearly 5 Tbps. Additionally, the new Transformer engine enhances model transformer training speed by up to six times.
While the H100 and A100 share similarities in usage scenarios and performance characteristics, the H100 outperforms the A100 in handling large-scale AI models and more complex scientific simulations. The H100 is a superior choice for real-time responsive AI applications such as advanced conversational AI and real-time translation.
In summary, the H100 offers significant performance improvements compared to the A100 in terms of AI training and inference speed, memory capacity and bandwidth, as well as processing large and complex AI models. It is suitable for AI and scientific simulation tasks that demand higher performance.
L40S
The L40S is designed to handle next-generation data center workloads, including generative AI, large-scale language model (LLM) inference and training, 3D graphics rendering, scientific simulations, and more.
Compared to previous-generation GPUs like the A100 and H100, the L40S offers up to a 5x improvement in inference performance and a 2x improvement in real-time ray tracing (RT) performance.
In terms of memory, it is equipped with 48GB of GDDR6 memory and includes support for ECC, which is crucial for maintaining data integrity in high-performance computing environments.
The L40S features over 18,000 CUDA cores, which are parallel processors essential for handling complex computational tasks.
While the H100 focuses more on decoding, the L40S places greater emphasis on visualization and encoding capabilities. Despite being slower than the H100, the L40S is relatively more accessible in terms of availability and price in the market.
In conclusion, the L40S offers significant advantages in handling complex and high-performance computing tasks, particularly in the fields of generative AI and large-scale language model training. Its efficient inference performance and real-time ray tracing capabilities make it a compelling option for data centers.
H200
The H200 is the latest addition to the NVIDIA GPU series and started shipping in the second quarter of 2024.
The H200 is the first GPU to offer 141 GB of HBM3e memory and a bandwidth of 4.8 Tbps, which is nearly twice the memory capacity and 1.4 times the bandwidth of the H100.
In terms of high-performance computing, the H200 achieves up to 110 times acceleration compared to CPUs, resulting in faster results.
When handling Llama2 70B inference tasks, the H200 demonstrates twice the inference speed of the H100 GPU.
The H200 plays a key role in edge computing and Internet of Things (IoT) applications, specifically in the domain of Artificial Intelligence of Things (AIoT).
Expect the H200 to deliver the highest GPU performance in applications such as training and inference of the largest models (exceeding 17.5 billion parameters), generative AI, and high-performance computing.
In summary, the H200 provides unprecedented performance in the fields of AI and high-performance computing, particularly in handling large-scale models and complex tasks. Its high memory capacity and bandwidth, along with exceptional inference speed, make it an ideal choice for processing cutting-edge AI workloads.
NADDOD: Leading Provider of High-Quality NVIDIA GPU Interconnect Solutions
NADDOD is a leading provider dedicated to delivering high-quality interconnect solutions for NVIDIA GPUs. We specialize in offering high-performance, high-speed data transfer, and reliable interconnect solutions to meet the growing computational demands.
Our products support various optical modules, including InfiniBand, Ethernet 800G/400G/200G/100G, as well as AOC and DAC technologies. These advanced interconnect products enable NVIDIA GPUs to achieve faster and more reliable data transfer, providing users with exceptional performance and flexibility.
Whether in data centers, high-performance computing, or other fields, NADDOD's interconnect solutions cater to the needs of our customers. We are committed to continuous innovation and technological advancement, constantly improving the performance and quality of our products to ensure the best user experience.
By choosing NADDOD, you gain access to high-quality interconnect solutions for NVIDIA GPUs and professional technical support. We work closely with you to provide customized solutions that meet your specific requirements.
In addition to offering third-party high-quality optical modules, we also stock a wide range of original NVIDIA products, providing you with more options. Contact us now to learn more details!


- 1Seven major trends in the development of large-scale data center networks
- 2Reason Why Choose InfiniBand for RDMA
- 3Unveiling The Evolution of NVLink
- 4NADDOD 1.6T XDR Infiniband Module: Proven Compatibility with NVIDIA Quantum-X800 Switch
- 5Vera Rubin Superchip - Transformative Force in Accelerated AI Compute