Nvidia’s latest innovation, the Vera Rubin architecture, had just been made official. This thrilling announcement occurred within the parking lot of the Consumer Electronics Show (CES) in Las Vegas. This advanced architecture is designed to change the game in more than one way, with innovative computing components like our Vera CPU and Rubin GPU leading the charge. These innovations enable major leaps in performance and efficiency for machine learning workloads, especially transformer-based inference workloads.
The Rubin GPU is capable of an astounding 50 quadrillion floating-point operations per second (petaFLOPS) using 4-bit calculation. This performance is five times higher than what Nvidia’s former Blackwell architecture provided. This breakthrough will help meet the growing needs of data centers. It will further enhance the capabilities of new AI applications. As a next step, the company expects to put the Vera Rubin architecture in the hands of customers by late this year.
Key Components of the Vera Rubin Architecture
The Vera Rubin design relies on six new specialty chips optimized to deliver the best performance. It has the powerful Vera CPU, the efficient Rubin GPU, and four specialized networking components. The high-bandwidth ConnectX-9 networking interface card from Mellanox Technologies is key to ensuring quick data transfer. At the same time, the highly efficient BlueField-4 data processing unit increases computational power efficiency.
Yet another important addition is the Spectrum-6 Ethernet switch, which uses co-packaged optics to increase throughput between racks. This groundbreaking technology reduces overall inference costs by a factor of 10. Compared to the Blackwell architecture, this cuts by four the number of GPUs required to train certain models.
Gilad Shainer, senior vice president of networking at Nvidia, emphasized the importance of orchestration across these various elements. That collaboration is needed to unlock the intended performance benefits.
“The same unit connected in a different way will deliver a completely different level of performance,” – Gilad Shainer
Shainer further explained how the architecture builds off a trend of doing inferencing in an increasingly smaller footprint among today’s computing environments.
Transforming Data Center Operations
The Vera Rubin architecture addresses the new landscape of data center operations. This moves inferencing from a single GPU model to a cloud-based distributed approach. Shainer brought attention to this shift in the course of his panel discussion at CES.
“Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” – Gilad Shainer
He further described how today’s workloads require a more agile infrastructure that can flexibly support distributed inferencing across many racks.
“Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to go across racks,” – Gilad Shainer
This evolution marks an important chapter in data processing capability, especially considering how networks have grown more intricate and intertwined.
A Vision for Future Computing
Nvidia’s newly announced Vera Rubin architecture dramatically improves performance while future-proofing the needs of increasingly complex data centers. As companies scale their operations and increase their reliance on GPUs… In turn, Nvidia is countering by introducing new architecture to continue to get ahead of their competition.
He went on to describe the design philosophy behind the Vera Rubin architecture. He dubbed this process “extreme co-design,” noting how each element was developed to complement each other, boosting effectiveness and performance.
“It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center,” – Gilad Shainer
He described the design philosophy behind the Vera Rubin architecture as “extreme co-design,” emphasizing that all components were developed with synergy in mind to maximize efficiency and performance.
“That’s why we call it extreme co-design.” – Gilad Shainer

