Nvidia Unveils Vera Rubin Architecture at CES 2023

Nvidia made waves this week in the tech world with its announcement of the new Vera Rubin architecture. This groundbreaking announcement happened at the Consumer Electronics Show (CES) in Las Vegas. This overall architecture consists of the Vera CPU, the Rubin GPU along with four specialized chips for high-speed networking. When combined, they will transform…

Tina Reynolds Avatar

By

Nvidia Unveils Vera Rubin Architecture at CES 2023

Nvidia made waves this week in the tech world with its announcement of the new Vera Rubin architecture. This groundbreaking announcement happened at the Consumer Electronics Show (CES) in Las Vegas. This overall architecture consists of the Vera CPU, the Rubin GPU along with four specialized chips for high-speed networking. When combined, they will transform computing power as we know it — particularly artificial intelligence. With game-changing performance metrics and cost-efficiency features, the Rubin architecture will be available to customers later this year.

The Rubin GPU is at the center of this new architecture. For example, it features a mind-blowing 50 quadrillion floating-point operations per seconds (petaFLOPS) for 4-bit calculation. It brings some phenomenal performance with it, like hitting 10 petaflops equivalency on Nvidia’s last Blackwell architecture. Its true potential becomes apparent when performing transformer-based inference workloads, such as those found in large language models. This jump in processing power is intended to meet the need of the increasing demand in AI and machine learning applications.

Enhanced Performance and Cost Efficiency

Nvidia’s named Vera Rubin architecture and positioned it as the avenue for enterprises to start making the most of their AI workloads. Combined with other advancements, the architecture provides up to a ten-fold improvement in inference costs, democratizing access to high-performance computing. It allows training high-performance generative AI and foundation models with only one-fourth of the GPUs previously needed. This improvement is a major step forward above the previous Blackwell micro-architecture.

>Gilad Shainer, Nvidia’s senior vice president of networking, toiled for years to raise awareness of the ongoing transformation within the industry. “Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” he stated. He went on to say that today’s trends support distributed inferencing. This method looks rack-wide rather than confined to a single piece of gear.

Shainer elaborated on the architectural design, stating, “The same unit connected in a different way will deliver a completely different level of performance.” Internally, Nvidia hails the principle of “extreme co-design.” From a research perspective, they advance hardware and network capabilities in parallel to ensure the highest efficiency and performance are realized.

Advanced Networking Capabilities

The networking components are integral to the Vera Rubin architecture. The architecture features several networking chips that enhance data flow and connectivity. Among these is the ConnectX-9, a networking interface card that reduces latency and increases bandwidth among components. Additionally, the BlueField-4 data processing unit works in tandem with two Vera CPUs and a ConnectX-9 card, enabling rapid processing and data transfer.

The Spectrum-6 Ethernet switch, an additional key component of the Rubin architecture. By taking advantage of co-packaged optics, it makes data transmission between racks and ultimately ensures that information can flow seamlessly across hundreds of data centers. SmartNICs support Nvidia’s argument for moving more operations off of GPUs and into the networking infrastructure.

Shainer explained the necessity of this approach: “It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center.” He made the case that 100,000 GPUs is not cutting it for some workloads anymore. Consequently, we’re being challenged to connect many more data processing centers to meet the exponential processing requirements of the future.

Future Implications for AI Workloads

The introduction of Nvidia’s Vera Rubin architecture signals a significant shift in how AI workloads will be managed in the future.

Collaborative ecosystems
Organizations are more dependent than ever on collaborative distributed systems. Their ability to connect multiple GPUs and take advantage of improved networking capabilities will be key to their success.

Shainer highlighted this shift by stating, “Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to go across racks.” This development means that companies will have to rethink their physical infrastructure to support these needs.

The Vera Rubin architecture features a state-of-the-art design with a focus on efficiency. This strategy has made Nvidia the dominant player in AI hardware. As it prepares for release later this year, companies will undoubtedly be eager to explore how this technology can enhance their operations and reduce costs.