Nvidia announced its world first Vera Rubin architecture at this year’s Consumer Electronics Show (CES) in Las Vegas. This groundbreaking innovation is poised to transform the way we process data in any computing environment. Scheduled to begin delivering to customers later this year, the Vera Rubin architecture has the potential to deliver big leaps in performance and energy efficiency. With up to a ten-fold reduction in inference costs, it looks set to transform the way enterprises tackle traditional business tasks like AI and machine learning.
The completely new architecture offers flexibility for any workload and any numeric format, providing a future-proof foundation that’s as dynamic as the needs of today’s computing. The model serves as a basis for an astounding four-fold reduction in the number of GPUs required to train certain models. This is especially noticeable in contrast to Nvidia’s previous generation Blackwell architecture. This latest breakthrough further highlights Nvidia’s focus on improving computational efficiency and reducing cost-of-operations.
Key Features of the Vera Rubin Architecture
At the core of Nvidia’s Vera Rubin architecture are six new chips: the Vera CPU, the Rubin GPU, and four specialized networking chips. Of those, the Rubin GPU features a massive 50 petaFLOPS for 4-bit computation. On Blackwell, it manages a stunning 10 petaflops. This performance really shows off with transformer-based inference workloads, such as those used in large language models.
This architecture utilizes a two-network-in-one model made up of a scale-up network and a scale-out network. It’s technology packed, including cutting-edge parts such as the ConnectX-9 networking interface card. Onboard is the BlueField-4 DPU data processing unit and Spectrum-6 Ethernet switch. The ConnectX-9 card is used for networking, while the BlueField-4 works in tandem with two Vera CPUs and a ConnectX-9 card to process data efficiently. The Spectrum-6 Ethernet switch uses co-packaged optics to transfer massive amounts of data quickly between racks.
“Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” – Gilad Shainer, Senior Vice President of Networking at Nvidia.
This transition reflects a significant shift in the computing landscape, as organizations increasingly require distributed inferencing capabilities.
Enhancing Computational Efficiency
Nvidia’s Vera Rubin architecture aims to go beyond just cranking up performance and allow researchers to cut down on data handling time, too. By doing computation as data is moving between GPUs, the architecture reduces latency and optimizes performance across the board. This patent-pending approach enables smooth real-time operations with billions of GPUs across differentiated locations, ranging from the cloud to edge.
“Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to go across racks,” – Gilad Shainer.
This new unified capabilities help organizations tame the increasing complexity of their AI workload while running their resources more efficiently.
The architecture’s design philosophy emphasizes what Shainer refers to as “extreme co-design,” where different components are engineered to work together optimally.
“The same unit connected in a different way will deliver a completely different level of performance,” – Gilad Shainer.
Together, this paradigm shifts focus and efforts in opening new avenues of efficiencies and processing capabilities in our data centers.
Future Implications for Data Centers
In commercial situations, businesses are always demanding more, as much computational power as they can get. Nvidia’s new Vera Rubin architecture is raising to fill those demands and then some. The resulting decrease in GPU requirements will reduce costs while simplifying data center operations.
“It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center,” – Gilad Shainer.
As the demand for these AI-driven applications continues to increase exponentially, the demand for efficient and scalable solutions takes on a new urgency. The Vera Rubin architecture will no doubt prove to have been a huge influence in how the next steps forward in this space are taken.

