Last week, at the Consumer Electronics Show in Las Vegas, Nvidia unveiled its newest technological marvel, the Vera Rubin architecture. We’re planning more customer beta testing before the new platform is fully rolled out to customers later this year. It does so with a supposed ten-fold reduction in inference costs! With an impressive design that features six newly developed chips, including the Vera CPU and the Rubin GPU, Nvidia aims to redefine performance standards in the industry.
According to the Vera Rubin architecture, it provides a great leap in quality compared to its predecessor, Blackwell architecture. Most notably, it claims a four-times reduction in the number of GPUs needed to train certain models. What’s notable about this development is that it is in lockstep with the growing sophistication of today’s artificial intelligence. It helps meet the growing expectations of machine learning applications.
Key Features of Vera Rubin Architecture
The introduction of Vera Rubin has a number of different elements, all aimed at maximizing computing efficiency and speed. The Rubin GPU is capable of an astounding 50 quadrillion floating-point operations per second, or petaFLOPS. This fantastic performance is highly optimized for 4-bit calculations. This is a five-times increase in 4-bit compute power compared to the previous Blackwell architecture. This improvement is huge for all transformer-based inference tasks, such as those using large language models (LLMs).
Vera Rubin architecture with four different networking chips. These chips, collectively called a CAM, are critical for keeping data moving and flowing into the Rubin GPU for processing. With industry-leading connectivity and speeds, NVIDIA ConnectX-9 smart NIC architectures are ready for the most demanding workloads and applications. At the same time the BlueField-4 data processing unit provides better overall system performance. Additionally, the Spectrum-6 Ethernet switch uses co-packaged optics to enable low-latency and high-bandwidth data transfer between racks.
Gilad Shainer, Nvidia’s senior vice president of networking, provided insight into the architecture’s design philosophy. He likened the agency’s system to a pizza parlor trying to improve delivery speeds. He stressed that even if we just change how the same unit is connected, we can end up immediately achieving a massively different level of performance.
A Shift Towards Distributed Inference
Beyond the library, Shainer stressed a more fundamental change. “Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” he stated. As workloads have become more distributed across multiple environments, the demand and requirement for interconnectivity between several multi-tenant data eco-systems has exponentially grown. Currently, inferencing is going distributed and it’s not just in a rack somewhere. It’s just going to pass over racks,” Shainer said.
This advancement in processing power means that all the new architecture components must operate harmoniously with one another to even begin to realize the ideal performance benefits. Shainer emphasized the importance of this integration by stating, “That’s why we call it extreme co-design.” The harmony between all these components is crucial for being able to support various workloads and numerical representations.
Meeting Growing Demands
As workloads become more dynamic than ever, Nvidia knows that the old ways just won’t cut it. Shainer pointed out that “100,000 GPUs are not enough anymore for some workloads,” underlining the necessity for scalable solutions that can adapt to increasing demands. This Vera Rubin architecture is built with this scalability from the start, making it affordable to achieve requirements for the future.
Nvidia’s Vera Rubin architecture features some bold advances and cool design elements. Above all, it seeks to establish new standards of excellence and efficiency in creative work as well as technological production. The company’s goal is to keep up with the increasing demands for computing power – quickly, efficiently and at a competitive price.
“Jitter means losing money.” – Gilad Shainer

