Nvidia Unveils Innovative Vera Rubin Architecture at CES

Well, that’s exactly what Nvidia announced today at the Consumer Electronics Show (CES) in Las Vegas. They shared an early look at their thrilling new Vera Rubin architecture. This cutting-edge platform holds the potential for major breakthroughs in speed and efficiency, especially in the fields of artificial intelligence and machine learning. The Vera Rubin architecture…

Tina Reynolds Avatar

By

Nvidia Unveils Innovative Vera Rubin Architecture at CES

Well, that’s exactly what Nvidia announced today at the Consumer Electronics Show (CES) in Las Vegas. They shared an early look at their thrilling new Vera Rubin architecture. This cutting-edge platform holds the potential for major breakthroughs in speed and efficiency, especially in the fields of artificial intelligence and machine learning. The Vera Rubin architecture reduces the inference cost by a factor of ten. On top of that, it cuts the required Graphics Processing Units (GPUs) for training models down by four times the requirement of Nvidia’s previous Blackwell architecture.

The new Rubin GPU achieves more than 50 quadrillion floating point operations per second (petaFLOPS) in 4 bit computation. This increase further augments its abilities to tackle more sophisticated, tricky tasks. In comparison, the Blackwell architecture only provided 10 petaFLOPS on transformer-based inference workloads (i.e., large language models). This monumental breakthrough in performance shows us the lengths that the company Nvidia is willing to go to achieve data processing performance breakthroughs.

A New Era of Networking

Nvidia’s Vera Rubin architecture includes six new chips designed to optimize performance: the Vera CPU, the Rubin GPU, and four networking chips. These networking chips are ConnectX-9, BlueField-4, and Spectrum-6 Ethernet switch. They are key components unlocking the new data flow and processing capabilities.

ConnectX-9 serves as the most capable and intelligent NIC. In parallel, BlueField-4 takes the role of a data processing unit, with support from two Vera CPUs and a ConnectX-9 smartNIC card. The new Spectrum-6 Ethernet switch with its co-packaged optics has enabled it to move and process data across the racks efficiently. Nvidia knows what they’re doing, and they’re going vertical in the right way by building a scale-out network of chips. This configuration allows for more streamlined processing and increases data stewardship.

Gilad Shainer, Nvidia’s senior vice president of networking. Here’s his take on why it makes sense to move some operations from GPUs to the network. He said that by taking this approach, they can ensure that certain operations or tasks are only run once and not repeated on each GPU. It allows for computations to happen en-route, while data is being transferred between GPUs, thus maximizing performance overall.

“The same unit connected in a different way will deliver a completely different level of performance,” – Gilad Shainer

Transforming Inference Operations

The introduction of the Vera Rubin architecture represents an important change, not just in how the inference operations will be conducted. As Shainer explained, the inferencing landscape has changed rapidly in just the last couple of years.

“Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” – Gilad Shainer

Perhaps more importantly now, he says, inferencing is going more distributed, moving from a single rack to multiple racks. This rapid transformation provides new opportunities for increased scalability and efficiency in data centers.

Shainer is quick to point out that the architecture’s striking design is rooted in what he calls extreme co-design. This new design approach speaks to the intersection of hardware and software design process to optimize performance results. Nvidia powers the metaverse with ultra-fast networking and ultra-fast computing. That potent combination enables them to address the accelerating demand for increased GPU deployment in data centers.

“It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center,” – Gilad Shainer

Strategic Implications for Data Centers

Data centers are currently under immense pressure. They most urgently need to improve their operational efficiency. With AI applications spreading like wildfire, there’s an urgent demand for faster, easier and more affordable solutions. Nvidia’s new platforms are a smart, strategic answer to the storm brewing on the horizon.

The combination of several chips on the Vera Rubin design increases efficiency and optimizes resource distribution. By offloading workloads to specialized components, Nvidia hopes to take the guesswork out of complicated tasks while continuing to deliver fast performance at scale.

Shainer’s analogy of a pizza parlor helps unpack the advantages of such an architecture. He likens offloading workloads from GPUs to having a special operations force that makes pizzas. At the same time, a second team handles thousands of customer orders and deliveries seamlessly. Shifting to this diagnosis-oriented approach better streamlines overall service delivery by lowering wait times while boosting efficiency.

“Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to go across racks,” – Gilad Shainer