NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1

In the latest round of MLPerf Inference – a suite of standardized, peer-reviewed inference benchmarks – the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C, a high-bandwidth, low-latency interconnect for superchips.

In this post, we take a closer look at the great performance demonstrated by servers powered by the NVIDIA GH200 in the latest round of MLPerf Inference benchmarks.

The NVIDIA GH200 Grace Hopper Superchip is a new type of converged CPU and GPU architecture combining the high-performance and power efficient NVIDIA Grace CPU with the powerful Hopper GPU using NVLink-C2C, delivering 900 GB/s of bandwidth to the GPU, 7x faster than todays’ servers. With GH200, the CPU and GPU share a single per-process page table, enabling all CPU and GPU threads to access all system-allocated memory that can reside on physical CPU or GPU memory. When adopted, this architecture removes the need to copy memory back and forth between the CPU and GPU.

NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1 — *Figure 1. NVIDIA GH200 NVL2 Server*

NVIDIA GH200 NVL2 builds on the successes of the NVIDIA GH200 by connecting two GH200 Superchips with NVLink in a single node, making it easier to deploy, manage, and scale to meet the demands of single-node LLM inference, Retrieval Augmented Generation (RAG), recommenders, graph neural networks (GNNs), high-performance computing (HPC) and data processing.

The GH200 NVL2 fuses two Grace CPUs and two Hopper GPUs in an innovative architecture delivering 8 petaflops of AI performance into a single node. The Grace CPUs come with 144 Arm Neoverse cores and up to 960GB of LPDDR5X memory. The Hopper GPUs offer 288GB of the latest HBM3e memory and up to 10TB/s of memory bandwidth, 3.5x and 3x more than the H100 GPU respectively. This simplifies development with the coherent memory, delivers leading performance in a single server and allows customers to scale out to meet demand.

GH200 delivers world-class generative AI performance

The ecosystem is embracing NVIDIA GH200

HPE:

Oracle:

QCT:

Supermicro:

Conclusion

Leave a comment Cancel reply

Graphi Max

Navigation

Categories