Nvidia officially introduced its most recent big innovation, the Vera Rubin architecture, at the Consumer Electronics Show in Las Vegas. This new architecture is purpose-built for accelerated networking and AI workloads like large language models. The latter is an ambitious new Graphics Processing Unit (GPU), Rubin, the company is developing to deliver these new levels of computational muscle and efficiency.
The Vera Rubin architecture provides the astounding performance of 50 quadrillion floating-point operations per second, or petaFLOPS. It accomplishes this amazing throughput at 4-bit precision. This is a quite big deal advancement! According to ORNL, the Rubin GPU delivers up to 5x performance over Nvidia’s last-generation Blackwell architecture, especially on transformer-based inference workloads. Importantly, this architecture is completely different from the now-built and just-announced Vera Rubin telescope.
Overview of Vera Rubin Architecture
The Vera Rubin architecture is a wonderfully complex system. It showcases the powerful new Rubin GPU and six custom chips. These include the Vera CPU and four specialized networking chips: ConnectX-9, BlueField-4, and Spectrum-6 Ethernet switch. We’ll take a closer look at each one and understand how they optimize the collection and processing of data throughout vast networks.
ConnectX-9 provides a powerful, state-of-the-art networking interface card. BlueField-4 is functioning as a data-processing unit that works alongside a pair of Vera CPUs and the new ConnectX-9 card. The Spectrum-6 Ethernet switch uses co-packaged optics to send data between racks thru the fabric, increasing efficiency and performance across the entire network.
This architecture allows some computing-intensive tasks to be done just once instead of needing each GPU to do them on their own. This feature cuts down on redundancy and resource usage within data centers so dramatically, it makes processing times much faster.
Advantages of the Rubin GPU
As discussed above, the introduction of the Rubin GPU marks a significant new chapter in Nvidia’s strategy for executing inference workloads. Gilad Shainer, senior vice president of networking at Nvidia, highlighted the transformation in inferencing practices over the past two years.
“Just two years ago, inferencing was still primarily executed on one GPU, one box, one server,” noted Shainer.
As the demands on processing have changed, he noted a trend towards distributed inferencing across racks. Addressing this trend needs sharper solutions, like the Vera Rubin architecture.
“Currently, inferencing is going beyond a single rack and is already going across a couple racks,” noted Shainer.
The architecture’s unique design makes it possible to perform computations on data as it moves between GPUs. This methodology greatly minimizes the apparent data transfer time and helps streamline workflows dramatically.
“That same unit, if it’s connected in a different way, is going to provide a totally different degree of performance,” Shainer said.
Future Impact and Market Availability
With the Vera Rubin platform set to deliver to the first customers later this year, it provides significant benefits from the previous version. This architecture allows for a reduction in inference costs by an order of magnitude. On top of that, it needs 4x as few GPUs to train certain models compared with the Blackwell architecture. This possible change would have a huge effect on what is likely to be overwhelming operational costs for companies using AI tools.
Demand for additional GPU capacity within data centers has been increasing, according to Shainer. He continued, “This doesn’t end there. Now we’re experiencing the demands to scale up the number of GPUs in an on-premise data center.
He amplified the idea of a new design factory commitment through Nvidia’s unique collaborative approach, which he dubbed “extreme co-design.”

