The Future of AI Performance: A Race Among Chipmakers

As AI technology continues to evolve at a quick pace, the need for massive processing capacities to drive sophisticated models is growing. Qualcomm, AMD, Intel and Nvidia are all battling tooth and nail on this new frontier. As a result, they’re all in a mad dash to create chips powerful enough to drive advanced AI…

Tina Reynolds Avatar

By

The Future of AI Performance: A Race Among Chipmakers

As AI technology continues to evolve at a quick pace, the need for massive processing capacities to drive sophisticated models is growing. Qualcomm, AMD, Intel and Nvidia are all battling tooth and nail on this new frontier. As a result, they’re all in a mad dash to create chips powerful enough to drive advanced AI workloads. Qualcomm’s Snapdragon X chip stands out, boasting the ability to perform three trillion operations per second (TOPS), making it a strong contender for powering Microsoft’s Copilot+ features. AMD and Intel’s neural processing units (NPUs) offer between 40 to 50 TOPS, while Nvidia’s GeForce RTX 5090 claims an impressive AI performance of up to 3,352 TOPS.

This story was originally published by StateScoop and produced in association with. It further explores the ramifications for AI workloads and NPUs in the future of personal computing.

Current Landscape of AI Processing

Only Qualcomm’s Snapdragon X chip has truly dominated the market so far with its extraordinary processing power. Capable of processing three trillion operations per second, it’s more than capable of tackling the growing complexity of AI workloads. It’s precisely this capability that enables a seamless execution of more advanced features like Microsoft’s Copilot+, which embeds AI into daily workflows.

>AMD and Intel have released NPUs that underperform at 40 to 50 TOPS. While all of these numbers are impressive in their own right, none of them stack up against Qualcomm’s new flagship directly. AMD’s Ryzen AI Max merges compelling Ryzen CPU cores with discerning Radeon GPU cores on one chip. It sports an NPU with serious might rated at 50 TOPS, making another giant step into integrated AI processing.

“With the NPU, the entire structure is really designed around the data type of tensors [a multidimensional array of numbers],” – Steven Bathiche.

Intel plans to enhance its position through a strategic alliance with Nvidia to develop chips that pair Intel CPU cores with Nvidia GPU cores. This partnership seems to be designed for maximizing Nvidia’s visual computing strengths with graphics processing married to the Intel’s well known x86 architecture.

The Need for High TOPS in AI Models

To execute these advanced AI models, which hold hundreds of millions of their parameters, a minimum number of TOPS is required. Current average laptops aren’t cutting it, as you can barely run any useful AI models locally on a device older than a year. This shortcoming is a pretty big hurdle for users who want to be able to leverage cutting-edge AI features on their own computers.

Qualcomm’s AI 100 NPU, with more than 100 TOPS, is a response that breaks new ground to meet these challenges. This performance leadership position allows for deeper, more powerful and efficient processing of large-scale AI workloads versus traditional CPUs or even GPU accelerators. The faster the NPUs, the more tokens per second they can support, significantly improving responsiveness and interactivity when end users interact with AI models.

“NPUs are much more specialized for that workload. And so we go from a CPU that can handle three [trillion] operations per second (TOPS), to an NPU,” – Steven Bathiche.

The race among NPUs to be the fastest is definitely on. We need units capable of producing thousands of TOPS to come online in just a few years. As these innovations roll out, they’ll fundamentally change the way users experience AI capabilities on their PCs.

Integrated Architectures and Future Prospects

The combined integration of CPUs, GPUs, and NPUs into a single thermal head offers invaluable benefits. By addressing the total power envelope holistically, IoT device makers can create devices that are both sophisticated and low-powered.

“By bringing it all under a single thermal head, the entire power envelope becomes something that we can manage.”

This all-in-one approach minimizes latency and maximizes consistency plus performance. Even better, it allows for seamless data sharing between CPU and GPU components, taking away the pain of needing to manually transfer data between disparate memory systems. Joe Macri elaborates on this challenge:

“When I want to share data between our [CPU] and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back,” – Joe Macri.

Chipmakers today are engaged in an unprecedented pace of invention. That continuous advancement is now poised to move to unified architectures, making operations simpler and improving efficiency and performance of AI workloads on PCs.