Nvidia announced its Vera Rubin architecture at the Consumer Electronics Show in Las Vegas this week. This innovation is a big step forward in AI and machine learning capabilities. The Vera Rubin architecture was purpose built for transformer-based inference workloads, from LLMs to TTS. It holds great promise to provide big performance, efficiency and operational efficiencies.
The Vera Rubin design provides a remarkable 50 quadrillion floating-point operations per second (petaFLOPS) for 4-bit math. Its predecessor, the Blackwell architecture, offers just 10 petaFLOPS. This jump in performance is significant as the need for more sophisticated AI applications only increases. This architecture has been globally computationally conservative but locally massively parallel, yielding extraordinary power and impressive results. It includes the default strong scale-out network of networking chips which have been in production since about 2016.
Enhanced Performance and Efficiency
The Vera Rubin architecture embodies more than just a new workflow for AI workloads. It’s the broader set of calculations the network can easily do that’s most exciting to me. This flexibility enables a wide range of workloads and numerical formats. This flexibility is important in a world where technology changes at the speed of light.
First and foremost among the Vera Rubin platform’s many show-stopping features is its openness. More importantly, it can offload certain operations from the GPUs directly to the network, such as embedding lookups. By maximizing resource utilization and reducing bottlenecks that are often a hallmark of traditional environments, this strategy can lead to improved overall performance. This new architecture shrinks inference costs by a factor of ten. It cuts the number of GPUs needed to train individual models down to a fourth of what the previous Blackwell architecture required.
Gilad Shainer Senior Vice President of Networking at Nvidia underscored the importance of synergy between components. This unique cooperation leads to immense synergies in performance.
“The same unit connected in a different way will deliver a completely different level of performance,” – Gilad Shainer
Revolutionary Networking Technology
The cutting-edge scale-out network embedded into the Vera Rubin architecture is paved with more than 20 advanced chips that raise data handling capacity to new heights. Prominent among these are the ConnectX-9 networking interface card, the BlueField-4 data processing unit, and the Spectrum-6 Ethernet switch. This tremendous networking infrastructure has done an incredible job of tying many different data centers together. It is built to handle massive workloads that need more than 100,000 GPUs.
With AI inference becoming more distributed, often across multiple racks, Shainer pointed to the shift in operational paradigms.
“Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” – Gilad Shainer
This evolution further highlights a growing need for smarter networking solutions that can address the emerging complexities of AI applications.
Future Outlook and Availability
Nvidia plans to make the Vera Rubin platform generally available to customers later this year. This rollout has the potential to bring in a new era of AI processing capabilities. These recent developments represent a huge technical breakthrough. Beyond the T4, they’ve been outspoken on their belief that Nvidia is best equipped to serve the rapidly growing needs of data centers and AI workloads.
Some of the expectations for GPU utilization are just increasing by Shainer’s reckoning.
“It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center,” – Gilad Shainer
This ongoing development signifies Nvidia’s proactive approach to ensuring that their technology remains at the forefront of innovation in AI processing.

