Nvidia Unveils Vera Rubin Architecture at CES with Revolutionary Networking Capabilities

And now, at the Consumer Electronics Show (CES) in Las Vegas, Nvidia has made the latest cut in that computing architecture official. The name of this new architecture is Vera Rubin. Under this new architecture, NVIDIA, the ponies favorite, has introduced six new chips. It includes the Vera CPU and Rubin GPU, as well as…

Tina Reynolds Avatar

By

Nvidia Unveils Vera Rubin Architecture at CES with Revolutionary Networking Capabilities

And now, at the Consumer Electronics Show (CES) in Las Vegas, Nvidia has made the latest cut in that computing architecture official. The name of this new architecture is Vera Rubin. Under this new architecture, NVIDIA, the ponies favorite, has introduced six new chips. It includes the Vera CPU and Rubin GPU, as well as four customized networking chips. To realize that potential, the Vera Rubin architecture is built for high-performance computing. Its goal is to improve performance and reduce expense by orders of magnitude in a range of applications, particularly in AI.

The exact day aside, the system is impressive. Most capable GPUs reach 50 quadrillion floating-point operations per second (petaFLOPS). It does so extraordinarily impressive with only 4-bit computation. This is an enormous leap compared to the 10 petaFLOPS delivered by the previous Blackwell architecture. Deep learning workloads, and in particular, transformer-based inference tasks such as large language models. As Nvidia pushes forward, the Vera Rubin architecture seeks to reduce inference costs by ten-fold and decrease the number of GPUs required to train specific models by four times compared to its predecessor.

A Comprehensive Chipset for Advanced Computing

Features associated with the Vera Rubin architecture, such as the division into six distinct chips, all work together to allow Vera-Rubin based computers to excel. Of these, the Vera CPU is specifically designed to complement the Rubin GPU and maximize its computational efficiency. This strategic combination boosts processing power while addressing the unique needs of today’s more demanding AI workloads.

To further bolster this architecture Nvidia recently added intelligence networking with its ConnectX-9 networking interface card. This card is the backbone to data transfer and communication between all components. It helps make sure that each piece of the system works together smoothly.

Moreover, the architecture features the BlueField-4 data processing unit, which is paired with two Vera CPUs and a ConnectX-9 card. This type of configuration has been proven to support intricate and varied data processes with a high rate of data and a very low lag. To further boost performance, we’ve added a new Spectrum-6 Ethernet switch, which uses co-packaged optics to dramatically improve the transfer of data between racks.

Transforming Inference Workloads

Optimized specifically for transformer-based inference workloads, the Rubin GPU excels at running state-of-the-art AI workloads. This renders it a key application in natural language processing and machine learning overall. By offloading previously GPU-dependent operations to specialized networking components, Nvidia is focusing on more efficient processes and better overall efficiency.

According to Gilad Shainer, Nvidia’s senior vice president of networking, it took a lot of collaboration between various components to realize these performance gains. He notes that the rationale behind this architectural shift is two-fold: it reduces the workload on individual GPUs and mitigates data shuttle times between them.

“That’s why we call it extreme co-design.” – Gilad Shainer

Shainer elaborates on the evolution of inference technology, stating that “two years back, inferencing was mainly run on a single GPU, a single box, a single server.” As he emphasizes, one aspect that is countering the current trends is that inference is rapidly developing into distributed systems that go across racks.

Expanding Connectivity for Future Demands

Nvidia’s vision reaches farther than an individual system. It includes a scalable network that can interconnect dozens of data centers. With room for more than 100,000 GPUs, this innovative geo-distributed scale-up network further solidifies Nvidia’s commitment to serve the burgeoning demand for computational power.

Shainer takes a moment to reflect on why this need for greater connectivity and capacity is so consistently growing. “It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center,” he states. Through those data-driven efforts, the company seeks to establish a robust infrastructure. We will enable the development of emerging AI and machine learning technologies.

He further asserts that “the same unit connected in a different way will deliver a completely different level of performance.” This sort of adaptability speaks to Nvidia’s desire to constantly evolve its architecture to best meet the needs of new technological challenges.