Nvidia announced its game-changing Vera Rubin architecture at Consumer Electronics Show (CES) in Las Vegas. This innovation marks the biggest advancement yet in networking and computational capabilities. This new architecture allows for dramatically increasing data processing efficiency. Its performance really shines on sophisticated workloads, such as large language models and the like. Nvidia recently announced six new chips, including the Vera CPU and Rubin GPU, as well as four specialized networking chips. This is a very bold initiative to change how computation is done in data centers.
The Rubin GPU surprise us with something out-of-this-world. That’s 50 quadrillion floating-point operations per second (petaFLOPS), and with 4-bit computations! Moreover, it delivers 10 petaFLOPS on the Blackwell architecture solely for transformer-based inference workloads. This is a significant improvement compared to previous generations and evidence of Nvidia’s dedication to innovating the limits of computational possibilities.
Enhancing Speed and Efficiency
Nvidia’s new Vera Rubin architecture will expedite the entire order fulfillment cycle within data centers. It’s the equivalent of helping a pizza shop get more deliveries faster. By distributing the burden of operations from GPUs to the network, the architecture aims to optimize latency and overall performance. The rationale behind this strategy is twofold: it conceals the time required to transfer data between GPUs and it enables computations to occur while the data is en route.
As he put it, “For jitter is to lose dollars,” emphasizing the need to reduce latency in low latency, high dollar computing resources.
The architecture is a significant departure in terms of how inferencing workloads are orchestrated and managed. “Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” Shainer noted. Vera Rubin’s introduction is a good example of this movement away from centralized alho inferencing systems that don’t leave a single rack.
A New Era of Distributed Computing
When it comes to the actual design of the Vera Rubin architecture, it’s a testament to Nvidia’s long-term ambitions for distributed computing. That’s changing. Shainer noted that inferencing is increasingly extending across several racks, instead of being limited to one. Currently, inferencing is going distributed, and it’s not just in the rack. It’s not going to be run on racks, he continued.
This richly distributed approach helps operations to do more flexible, sophisticated computation while making it easier to use scarce computational resources effectively. Shainer further elaborated on the synergy required among the components: “The same unit connected in a different way will deliver a completely different level of performance.” In many ways, this interconnectedness is at the heart of maximizing what the Vera Rubin architecture can truly do.
According to Nvidia’s projections, their new platform will reduce inference costs by a factor of ten. It will reduce the number of GPUs required to train some models four times as compared to the prior Blackwell architecture. This massive decrease may contribute to improved cost-effectiveness for businesses that rely heavily on large AI models.
Extreme Co-Design for Maximum Impact
Nvidia has taken an “extreme co-design” approach to create its Vera Rubin architecture. Through this approach, we guarantee that every element functions in unison, like a well-tuned orchestra, to achieve optimal performance. This ideology understands that integration and collaboration between hardware components are equally as important in the race to extensive breakthroughs in computing capabilities.
Shainer remarked on the continuous demand for greater GPU capacity within data centers: “It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center.” All of this points to how Nvidia is uncharacteristically quick on their feet with the market. In parallel, the company expects future demands in the realm of High-Performance Computing.
Since around 2016, the Vera Rubin architecture has been much more broadly adopted. This new computation bolsters the non-linear computation capabilities of the Sign Network. As organizations increasingly rely on AI-driven solutions, Nvidia’s innovations aim to provide them with the tools necessary to thrive in an evolving digital landscape.

