Qualcomm has been the first to bring these neural processing units (NPUs) to Windows laptops. This innovation further demonstrates the groundbreaking evolution of personal computing. This is no small feat, as the company faces increasing pressure from AMD and Intel, who are both working aggressively to improve their own NPUs. Get ready for NPUs with thousands of trillions of operations per second (TOPS) to come to market. This breakthrough will dramatically change what is possible in terms of AI capabilities on PCs.
>Recently, underwriting Qualcomm’s AI 100 NPU has come to dominate this niche. For a single unit, it provides a remarkable 40 to 50 TOPS of performance. Competitors like AMD’s Ryzen AI Max, rated at 50 TOPS, and Intel’s offerings, which rival Snapdragon with comparable performance, are poised to reshape the competitive field. These companies are investing tens of billions of dollars into NPU technology. This increases the pressure to achieve better TOPS performance, resulting in faster, more efficient AI processing capabilities for end users.
The Competitive Landscape of NPUs
Qualcomm’s first mover advantage in the NPU market has only helped them further. The performance of the AI 100 NPU has become the gold standard that everyone else would like to match. Yet, AMD is not far behind. The Ryzen AI Max unites Ryzen CPU cores and Radeon GPU cores, directly on the same silicon. It comes with a powerful NPU that boosts performance. This innovation increases performance, but it enables a memory-coherent architecture optimized for higher efficiency on data-heavy workloads.
Intel’s NPUs have made strides, matching Qualcomm’s performance metrics. In 2023, AMD’s chips with NPUs were rare and only supported close to 10 TOPS. We all know that advancements in technology are moving at lightning speed. Consequently, AMD and Intel have arguably never been in a better position to compete at the high performance levels.
“NPUs are much more specialized for that workload. And so we go from a CPU that can handle three trillion operations per second (TOPS), to an NPU.” – Steven Bathiche
For example, this specialization allows NPUs to handle AI tasks more efficiently. On the flip side, traditional processors suffer as they fail to be optimized for these workloads. The stakes for developing faster NPUs have never been higher. Users still need to process faster and faster tokens-per-second which engages users thus enhancing their usage of AI models.
The Future of NPU Performance
Over the next few years, we look forward to a sea change in the NPU landscape. Other models will come online, models hundreds of times more powerful able to deliver thousands of TOPS. This evolution is more than a quest for big data and high fives. It foreshadows a substantial turn in how personal computers will approach AI workloads. You can harness the power of AI workloads affordably on local hardware with the Windows ML runtime now available. This is a revolutionary time in the user experience with technology.
Even as companies create these new NPUs, they understand that having exclusive — if powerful — NPUs is not enough. Beyond that they need to balance computing for traditional needs with what’s emerging in AI capabilities. Mike Clark emphasizes this dual focus, stating,
“We must be good at low latency, at handling smaller data types, at branching code—traditional workloads. We can’t give that up, but we still want to be good at AI.”
Incorporating NPUs into established architectures increases overall efficiency and performance. This breakthrough enables devices to perform many more tasks concurrently.
Innovations Driving NPU Integration
The creative advances packed into today’s NPUs arose from an acute awareness of users’ needs and humans’ technological potential. Qualcomm’s Vinesh Sukumar expresses a clear vision for the future:
“I want a complete artificial general intelligence running on Qualcomm devices.”
This level of ambition is indicative of a larger trend across the industry. The key is in building scalable and smart infrastructures that fully utilize every processor at their disposal.
By consolidating multiple processing units under a single thermal head, manufacturers aim to manage power consumption effectively while maximizing performance.
“[AI Foundry] is about being smart. It’s about using all the processors at hand, being efficient, and prioritizing workloads across the CPU, the NPU, and so on.”
Mahesh Subramony adds an essential perspective on system-on-chip (SoC) design:
“We have to be very deliberate in how we design our [system-on-a-chip] to ensure that a larger [SoC] can perform to our requirements in a thin and light form factor.”
By consolidating multiple processing units under a single thermal head, manufacturers aim to manage power consumption effectively while maximizing performance.

