Nvidia recently dropped jaws with a big announcement during this month’s Consumer Electronics Show (CES) in Las Vegas. Their biggest news was the release of their new Vera Rubin architecture. Our powerful and innovative architecture underlies AI ready digital twins that are about to start revolutionizing the AI and ML industries. It will be going to customers later this year. The Vera Rubin architecture reduces inference costs by a factor of ten. Beyond increases in workflow and overall performance, it dramatically drops the number of GPUs required to train specific models.
As well as full FP8 support, the new architecture provides a four-times reduction in GPU requirements compared to Nvidia’s previous Blackwell architecture. This is an exciting development because more and more, organizations are depending on these distributed systems to do the heavy lifting of their computational workloads. Gilad Shainer, Nvidia’s senior vice president of networking, talked about this trend. He emphasized that moving from single GPU systems to as required distributed infrastructures answers the emerging needs of today’s AI workloads.
Key Features of Vera Rubin Architecture
The Vera Rubin architecture maximizes cost efficiency. It adds a number of new features that greatly improves overall performance. The collection includes the ConnectX-9 networking interface card, the BlueField-4 data processing unit and the Spectrum-6 Ethernet switch. Spectrum-6 scale with co-packaged optics, like these, to provide high-speed in-rack data transmission.
This comprehensive platform includes a total of six new chips tailored for Vera-Rubin-based computers: the Vera CPU, the Rubin GPU, and four distinct networking chips. The new architecture includes the exciting new Rubin GPU. For 4-bit computations, it can deliver a jaw-dropping 50 quadrillion floating-point operations per second (petaFLOPS). Designed specifically to cater for transformer-based inference workloads, such as large language models, the Rubin GPU is the cornerstone of generative AI. It is a monumental jump in computing power.
“The same unit connected in a different way will deliver a completely different level of performance,” – Gilad Shainer
The underlying design philosophy of the Vera Rubin architecture showcases efficiency taken to the extreme via extreme co-design. By enabling certain tasks to be done only once, the architecture reduces unnecessary processing. This calculation heavy approach allows for flexibility in which GPU actually runs these tasks. This strategy is crucial for improving efficiency of both energy and other resources in data centers and increasing computational throughput per dollar.
Innovations in Networking
In keeping with this commitment to expanding possibilities to network, Nvidia brought cutting-edge technology to the Vera Rubin architecture. The development of the ConnectX-9 card and Spectrum-6 switch is a major breakthrough for data orchestration architectures. These new technologies are rapidly and efficiently tackling workloads and distributing data across multiple racks.
Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to go across racks,” Shainer noted, highlighting the trend towards expansive data processing environments. The architecture’s flow-through design allows for complex real-time computations to be processed on data as it streams across GPUs. This is an effective way to hide latency and improve system performance globally.
This smart strategy has the dual benefit of lowering inference costs while supporting greater data center scalability. Nvidia is well aware of the overwhelming demand for additional GPUs as organizations seek to supercharge their computing capabilities.
“That’s why we call it extreme co-design,” – Gilad Shainer
Implications for AI and Machine Learning
The debut of the Vera Rubin architecture has a profound impact on sectors that increasingly depend on AI and machine learning technologies. For companies, it can drastically reduce inference costs and resource needs. This jump in computational efficiency underpins the increasingly rapid cycle of building, testing, and deploying AI solutions.
Shainer expressed enthusiasm about the future of AI workloads with this new architecture: “Two years back, inferencing was mainly run on a single GPU, a single box, a single server.” This landmark evolution into distributed systems is a response to our increasing need for higher and more complex processing power capabilities.
Nvidia’s relentless focus on innovation has established the company as the clear leader in technological infrastructure for AI. Organizations that implement the Vera Rubin architecture will find themselves with extraordinary new capacities. This’ll allow them to better meet the needs of those doing large-scale, cutting-edge, applied machine learning at scale.

