While artificial intelligence is still maturing, the hardware driving these technologies is undergoing a revolution. Most laptops in use today aren’t built to fully harness the benefits of these innovations. Today’s PCs all rely on a split memory architecture. This design approach, set in stone by decisions made over 25 years ago, limits their ability to quickly and efficiently process these new large AI models.
Recent news has underscored the competitive threat posed by tech titans such as AMD, Intel and Qualcomm. Each company is taking NPUs to the limit with their proprietary NPUs. AMD and Intel just announced NPUs capable of providing 40 to 50 TOPS. Qualcomm’s Snapdragon X chip, which fits in your phone, manages a staggering three trillion operations every second. The competition is definitely heating up. In just a few years, we should see NPUs capable of tens of thousands of TOPS.
The push for more powerful processing capabilities is crucial as laptops over a year old struggle to run useful AI models locally. The average user of today’s large language models (LLMs) interacts with them through online, browser-based chat interfaces. This is mostly explained by hardware constraints.
The Memory Architecture Challenge
The design of most modern PCs includes two separate pools of memory: system memory and graphics memory. This independent operation can contribute to inefficiencies in efficiently processing AI tasks.
Joe Macri, an influential technologist, dives into the challenges that are part and parcel with this architecture.
“When I have a discrete GPU, I have a separate memory subsystem hanging off it,” – Joe Macri
>This physical separation requires transferring data across memory spaces, introducing additional overhead that quickly adds up and increases the time it takes to process.
“When I want to share data between our [CPU] and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back,” – Joe Macri
Now, with the advent of NPUs purpose-built to handle the decoupled AI workloads, manufacturers are starting to make significant strides to remediate these issues. These chips are built to speed data computation for tensor-heavy workloads, boosting performance for a wide range of applications.
This change marks a major step forward in adding more processing muscle where it can most positively impact AI performance and user experience.
“With the NPU, the entire structure is really designed around the data type of tensors,” – Steven Bathiche
Now that companies like AMD and Intel are pouring money into NPUs, users are beginning to experience performance gains for AI workloads. While NPUs were still a bit of a rarity in 2023, they were definitely on the rise. Other alternative architectures might be able to provide 10 TOPS or so, but that wasn’t enough for high-performance markets.
The Rise of Specialized NPUs
With more recent iterations such as AMD’s Ryzen AI Max, we have clearly turned a corner. This accelerated processing unit, or APU, merges AMD’s Ryzen CPU cores with their Radeon-branded GPU cores on the same chip. It has an even more powerful NPU, providing a staggering 50 TOPS. Such setups enable the more resource-intensive AI applications to operate more smoothly on locally based hardware.
This incredible pace of growth in this space is just a reflection of an overall trend toward more and more power in consumer electronics. These technologies are rapidly developing. They fight to meet the demand from nascent applications such as Microsoft’s Copilot+, which rely on robust AI capabilities. The Snapdragon X chip is what makes these remarkable features possible. This is just one of many signs to show how competitive the market has gotten.
Vinesh Sukumar from Qualcomm emphasizes the importance of progress in this field:
This ambitious vision motivates the fundamental research and innovation today to develop the more powerful processing units necessary for existing applications and the visionary workloads of tomorrow.
“I want a complete artificial general intelligence running on Qualcomm devices,” – Vinesh Sukumar
All the advancement in hardware hasn’t changed the reality that users are still limited by the laptops they have. They can’t afford to run large-payload AI models locally. If your device is older than a year, you won’t be able to look into successful small language models (SLMs). Most of these models can’t function without eliminating the level of service.
The Impact on User Experience
Windows ML runtime makes it easy to run AI tasks locally once a model is integrated into an application. The current limitations mean that most users continue to engage with LLMs through online platforms rather than leveraging their local hardware.
Inventions like Windows Recall and Windows Photos’ Generative Erase are examples of what we can receive to integrated AI capabilities. Windows Recall uses AI to break down screenshots and build an organized, searchable timeline of your history as a user. Windows Photos makes it easy to delete backgrounds or other subjects from your images.
As developers continue to push for improved performance metrics, like faster token handling by NPUs, users can expect a more fluid experience in AI interactions.
Steven Bathiche notes the importance of efficiency in design and performance:
This effort to prioritize optimization over expansion illustrates a promising trend. Personal laptops could be what finally democratizes access to the full power of powerful new lightweight AI models.
“It’s about being smart. It’s about using all the [processors] at hand, being efficient, and prioritizing workloads across the CPU, the NPU, and so on,” – Steven Bathiche
This emphasis on optimization hints at a future where laptops may finally be able to harness the full potential of advanced AI models.

