Learn about NVIDIA's first server CPUs

Author: Release time:2022-08-25 Source: Font: Big Middle Small View count:200

New chip designs reflect the breadth and depth of NVIDIA's platform innovations in AI, high-performance computing, and edge computing.


In four presentations over two days, NVIDIA senior engineers present innovations in accelerated computing for modern data center and network edge systems.


The Virtual Hot Chips conference is an annual processor and system architects gathering. NVIDIA senior engineers reveal performance data and other technical details about NVIDIA's first server CPUs, Hopper GPUs, the new NVSwitch interconnects chip, and the NVIDIA Jetson Orin System-on-Module (SoM).


These presentations provide insights into how NVIDIA platforms can achieve higher performance levels, efficiency, scale, and security.


Specifically, the presentations demonstrate a design philosophy of innovation across the entire silicon, systems, and software stack where GPUs, CPUs, and DPUs act as peer processors. Together, they created a platform already running AI, data analytics, and high-performance computing efforts within cloud service providers, supercomputing centers, enterprise data centers, and autonomous systems.


Learn about NVIDIA's first server CPUs


Data centers require flexible clusters of CPUs, GPUs, and other gas pedals sharing massive memory pools to deliver the efficient performance required by today's workloads.


To address this need, Jonathon Evans, a distinguished engineer and 15-year veteran at NVIDIA, introduced NVIDIA NVLink-C2C, which connects CPUs and GPUs at 900 gigabits per second, thanks to a data transfer that consumes only 1.3 petajoules per bit and is five times more energy efficient than the existing PCIe 5.0 standard.


NVLink-C2C connects two CPU chips to create NVIDIA Grace CPUs with 144 Arm Neoverse cores and 512 GB of memory - the processor for solving the world's largest computing problems.


For even greater efficiency, the Grace CPU uses LPDDR5X memory. It enables 1 TB per second of memory bandwidth while keeping power consumption to 500 watts across the complex.


One Link, Many Uses


NVLink-C2C also associates the Grace CPU and Hopper GPU chips as memory-sharing peers on the NVIDIA Grace Hopper Superchips to accelerate performance-hungry jobs like AI training even more.


Anyone can use NVLink-C2C to build custom small chips to orchestrate the connection of NVIDIA GPUs, CPUs, DPUs, and SoCs, extending this new integrated product class. The interconnect will support the AMBA CHI and CXL protocols used by Arm and x86 processors.


Grace and Grace Hopper's First Memory Benchmarks


The new NVIDIA NVSwitch connects multiple servers to a single AI supercomputer to scale at the system level. It uses NVLink to interconnect at 900 GB/sec, more than seven times the PCIe 5.0 bandwidth.


NVSwitch enables users to connect 32 NVIDIA DGX H100 systems to a single AI supercomputer that can deliver tens of trillions of sub-peak AI performance.


Alexander Ishii and Ryan Wells, two senior engineers at NVIDIA, describe how the switch enables users to build systems that can support up to 256 GPUs to handle demanding workloads, such as training AI models with more than 1 trillion parameters.


The switch includes an engine that accelerates data transfer using NVIDIA SHARP technology, a network computing feature originally used on the NVIDIA Quantum InfiniBand network. It can double the data throughput of communication-intensive AI applications.


NVSwitch system supports tens of trillions of sub-AI supercomputers.


Jack Choquette, a distinguished senior engineer with NVIDIA for 14 years, details the NVIDIA H100 Tensor Core GPU, also known as Hopper.


In addition to scaling to new heights with new interconnect technology, it includes many advanced features that improve the gas pedal's performance, efficiency, and security.


Compared to its predecessor, Hopper's new Transformer engine and upgraded Tensor Core deliver a 30x speedup when using large global neural network models for AI inference. In addition, it features the world's first HBM3 memory system, delivering up to 3 TB of memory bandwidth, the largest generational increase ever seen by NVIDIA.


Other new features include


Hopper adds virtualization support for multi-tenant, multi-user configurations.


A new DPX instruction set speeds up recursive loops for select mapping and DNA and protein analysis applications.


Hopper also supports enhanced security through confidential computing.


Choquette, who was the lead chip designer for the Nintendo 64 game console earlier in his career, also described some of the parallel computing technology behind Hopper's advancements.


Michael Ditty, the chief architect of Orin and a 17-year veteran of NVIDIA, explains the new performance specifications for NVIDIA Jetson AGX Orin, an engine for edge AI, robotics development, and advanced autonomous machines.


NVIDIA Jetson AGX Orin integrates 12 Arm Cortex-A78 cores and an NVIDIA Ampere architecture GPU to deliver up to 275 trillion operations per second for AI inference jobs. This is a 2.3x improvement in power efficiency and up to 8x improvement in performance over the previous generation.


The new production module includes up to 32 GB of memory and is part of a compatible family of 5W Jetson Nano developer kits that can be scaled down to pocket size.


Performance Benchmarking with NVIDIA Orin


All new chips support the NVIDIA software stack, accelerating more than 700 applications used by 2.5 million developers.


It is based on the CUDA programming model. It includes dozens of NVIDIA SDKs for vertical markets such as automotive (DRIVE) and healthcare (Clara), as well as technologies such as recommender systems (Merlin) and conversational AI (Riva).


NVIDIA AI platforms are available from all major cloud service and system manufacturers.


Hot News

Hot product