Nvidia Unveils Vera Rubin Architecture at CES to Transform Computing Landscape

Nvidia has made a significant announcement at the Consumer Electronics Show (CES) in Las Vegas, unveiling its latest architectural advancement known as the Vera Rubin architecture. This new architecture is designed to change the way we process data. It combines advanced technologies such as the Vera CPU, the Rubin GPU, and a package of four…

Tina Reynolds Avatar

By

Nvidia Unveils Vera Rubin Architecture at CES to Transform Computing Landscape

Nvidia has made a significant announcement at the Consumer Electronics Show (CES) in Las Vegas, unveiling its latest architectural advancement known as the Vera Rubin architecture. This new architecture is designed to change the way we process data. It combines advanced technologies such as the Vera CPU, the Rubin GPU, and a package of four different networking chips. The Vera Rubin architecture significantly increases performance while improving cost effectiveness. It’s built to provide the best performance for all workloads—including today’s large language model and generative AI inferencing-heavy workloads.

The Rubin GPU represents the beating heart of this architecture. For 4-bit computation, it can achieve an astounding 50 quadrillion floating point operations per second (petaFLOPS). That’s a remarkable performance compared to Nvidia’s last Blackwell GPU. Built to the unique requirements of ML inference workloads, specifically transformers, the Blackwell model achieves a staggering 10 petaFLOPS. The introduction of the Vera Rubin architecture is a huge step forward for Nvidia. It breaks new ground in performing computational workloads more efficiently across a wide range of numerical representations.

Key Components of Vera Rubin Architecture

The Vera Rubin architecture is indeed constructed of several key components, but their seamless integration is what makes them unique. The architecture includes the extreme power of the Rubin GPU and world’s fastest ConnectX-9 networking interface card. It includes the BlueField-4 data processing unit (DPU), which is used in conjunction with a pair of Vera CPUs and another ConnectX-9 card. This potent combination creates maximum data handling capabilities and optimal processing speed.

The architecture includes the Spectrum-6 Ethernet switch that uses co-packaged optics to transmit data at high speed between racks. According to Gilad Shainer, senior vice president of networking at Nvidia, these components must collaborate effectively to yield performance advantages.

“The same unit connected in a different way will deliver a completely different level of performance,” – Gilad Shainer

The architecture of the Vera Rubin contains major hardware breakthroughs. It further brings the scale-out network to life through the use of the industry’s first multi-networking chip powered network. Though this scale-out network is a key component of this new performance metric, it’s not the only thing that contributed to the 6X overall performance increase.

Transforming Inferencing Workloads

Nvidia’s new architecture represents a turning point in the way inferencing workloads are handled. In the past, inferencing workloads were typically run on one GPU in one server. Even going to the Vera Rubin architecture, there is a pronounced shift to distributed inferencing that goes across multiple racks.

“Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to go across racks,” – Gilad Shainer

This evolution represents a huge step in the operational paradigm for data centers. The improved design simplifies back of house operations by performing some functions only once. This offloads the burden of having each GPU perform the same task on its own. This new approach both expedites processes and reduces costs significantly. You could lower inference costs by up to ten times compared to the Blackwell architecture and require four times fewer GPUs to train the same models.

Shainer further elaborated on this aspect, stating, “Two years back, inferencing was mainly run on a single GPU, a single box, a single server.” This historical context highlights the revolutionary progress being achieved with the introduction of the Vera Rubin architecture.

Future Implications and Availability

The implications of the Vera Rubin architecture reach far beyond just performance-based metrics. The design itself has been painstakingly formulated in an effort to be as nimble and adaptable as the growing expectations for computational power in a data center. With growing multifaceted and data-rich workloads, the urgency to leverage processing speed and scale accelerates.

“It doesn’t stop here because we are seeing the demands to increase the number of GPUs in a data center,” – Gilad Shainer

Nvidia intends to begin shipping these VRubin-based supercomputers to customers starting later this year. The company’s promise to radical co-design required each element be carefully tuned with the next in order to achieve the greatest level of performance.

“That’s why we call it extreme co-design,” – Gilad Shainer