The Race for AI-Ready Laptops Heats Up

Artificial intelligence (AI) is one of the world’s most important and consequential technologies. At the same time, competition among manufacturers to build laptops with desktop-class NPUs is intensifying. As it stands, most modern PCs continue to rely on a separated memory architecture that was invented more than 25 years ago. To satisfy the insatiable appetites…

Tina Reynolds Avatar

By

The Race for AI-Ready Laptops Heats Up

Artificial intelligence (AI) is one of the world’s most important and consequential technologies. At the same time, competition among manufacturers to build laptops with desktop-class NPUs is intensifying. As it stands, most modern PCs continue to rely on a separated memory architecture that was invented more than 25 years ago. To satisfy the insatiable appetites of AI workloads, hardware vendors such as Qualcomm, AMD, and Nvidia are furiously accelerating performance. This article explores the current state of NPU technology, the performance benchmarks set by leading manufacturers, and the implications for users.

This trend towards deep AI integration into consumer electronics is driven by the rapid adoption of large language models (LLMs) and small language models (SLMs). Many use these models online, where they are made available by agencies. Yet, users are forced to contend with issues related to data-center outages that may remove these models from availability for hours at a time. To address these challenges, technology firms—including many at the TechCrunch conference—are developing tools that allow many AI tasks to be performed locally.

The Evolution of NPU Technology

Qualcomm’s AI 100 NPU is changing the way NPUs are being developed for Windows laptops. Now it delivers a staggering 10 TRILLION operations per second (TOPS 🏗️)! Their Snapdragon X chip powers Microsoft’s Copilot+ features, showcasing Qualcomm’s commitment to enhancing AI capabilities in consumer devices.

Competition remains intense, with AMD and Intel launching NPUs to compete against Qualcomm’s platforms. AMD’s Ryzen AI Max combines CPU cores with Radeon-branded GPU cores and includes an NPU rated at 50 TOPS. At the same time, Nvidia’s new GeForce RTX 5090 desktop GPU can hit a jaw-dropping AI performance of up to 3,352 TOPS.

“With the NPU, the entire structure is really designed around the data type of tensors,” – Steven Bathiche.

Joe Macri highlights the challenges of traditional architectures:

“When I have a discrete GPU, I have a separate memory subsystem hanging off it. When I want to share data between our CPU and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back.”

Local Execution vs. Cloud Dependency

The new Windows ML runtime represents an effort to create a common foundation for doing AI-intensive work on hardware locally. It intelligently routes workloads to the most effective processor—be it the CPU, GPU or NPU. This new system increases both efficiency and reduces latency. Consequently, their users are able to run demanding AI applications on their devices in a more seamless and effective way.

Even with these improvements, most users are still using LLMs through browser-based chat-style interfaces. Yet this heavy use of cloud services makes for frustrating stop gaps when a necessary data-center outage brings society to a stand-still. As Rakesh Anigundi notes:

“You’ll want to be running this for a longer period of time, such as an AI personal assistant, which could be always active and listening for your command.”

NPUs with the capability of thousands of TOPS are coming around the corner. They’re not years out but rather a couple of years out, and they’re going to bring a much more advanced local processing power.

The Future of AI in Consumer Devices

The promise is especially huge for the next generation of laptops that come loaded with advanced NPUs. Companies like AMD and Nvidia are actively working on chips that combine CPU cores with GPU cores to optimize performance further. As one of the foremost collaborations among industry leaders, this effort aims to reshape the entire approach to AI task management.

Yet, as NPUs have matured, they have become more ubiquitous within system-on-a-chip (SoC) designs. Now, manufacturers are learning how to develop chips that are both performant and power efficient. Mahesh Subramony emphasizes the importance of deliberate design choices:

“I want a complete artificial general intelligence running on Qualcomm devices.”

Steven Bathiche asserts that there remains considerable potential for improvement across various processors:

“We have to be very deliberate in how we design our system-on-a-chip to ensure that a larger SoC can perform to our requirements in a thin and light form factor.”

Looking ahead, Steven Bathiche asserts that there remains considerable potential for improvement across various processors:

“It’s about being smart. It’s about using all the processors at hand, being efficient, and prioritizing workloads across the CPU, the NPU, and so on. There’s a lot of opportunity and runway to improve.”