The Rise of NPUs in Laptop Technology

Consumer demand for artificial intelligence in personal computing has gone through the roof. In turn, tech giants like Google, Apple and Huawei are pushing the development of NPUs forward. Competitive NPUs like those being released by AMD and Intel are giving Qualcomm’s Snapdragon a run for its money. These new processors have mind-bending performance metrics,…

Tina Reynolds Avatar

By

The Rise of NPUs in Laptop Technology

Consumer demand for artificial intelligence in personal computing has gone through the roof. In turn, tech giants like Google, Apple and Huawei are pushing the development of NPUs forward. Competitive NPUs like those being released by AMD and Intel are giving Qualcomm’s Snapdragon a run for its money. These new processors have mind-bending performance metrics, hitting 40 to 50 trillion operations per second (TOPS). Qualcomm, another early leader in the space, released the first NPU to power Windows laptops. This created a highly competitive landscape, prompting leading companies, including AMD and Intel, to fiercely compete to enhance their products.

NPUs able to support 10,000 TOPS are just around the corner. As AI continues to have an impact across industries, the need for optimized hardware architecture has never been more urgent.

The Competitive Landscape of NPUs

Qualcomm’s first real attempt to leap into the NPU market has undeniably changed the landscape between semiconductor makers. Qualcomm’s Snapdragon X chip features an NPU that powers Microsoft’s Copilot+ functionalities, demonstrating how integrated AI can enhance user experience on Windows devices. This early introduction has sparked an arms race for performance, prompting AMD and Intel to invest heavily in their own NPU technologies.

AMD’s Ryzen AI Max is one such example, pumping out a seriously impressive 50 TOPS. Intel, as well, has released NPUs featuring performance on the order of 40 to 50 TOPS. Each of these companies are locked in a cut-throat competition to reach dominance in the emerging field of AI. Their goal is not just to keep pace with Qualcomm’s innovation, but to forge their own distinctive strengths.

“With the NPU, the entire structure is really designed around the data type of tensors (a multidimensional array of numbers),” – Steven Bathiche

The competition is intense, as every company tries to build the best possible architecture to run AI tasks. It will be design innovations that lead the way to this dramatic shift. Performance metrics will similarly be important in determining who takes the lead in this fast-changing industry.

Challenges with Current PC Architectures

Even with these improvements in NPU technology, there are still major hurdles to overcome when trying to combine these chips into current laptop architectures. Unfortunately, most contemporary PCs use a segregated memory design, an inefficient arrangement for AI workloads. This new reality creates additional challenges with how we collect, share, and use data. Transferring data between a CPU and GPU involves transferring data across different memory subsystems.

“When I have a discrete GPU, I have a separate memory subsystem hanging off it,” – Joe Macri

>This natural and unavoidable limitation can result in latency issues and inefficient processing speeds when operating AI models. The current architecture often necessitates transferring data out of memory, through the PCI Express bus, and back again after processing. These inefficiencies make it hard to realize the potential benefits NPUs could bring.

“When I want to share data between our CPU and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back,” – Joe Macri

Chip manufacturers are acutely aware of these limitations. One way to facilitate that is by broadly pursuing solutions to make integration of NPUs within laptops seamless.

The Future of AI-Ready Laptops

With the advent of NPUs, the future of laptops is even more exciting! To fully unlock their potential, we need to make breakthroughs in hardware design and software optimization. NPUs that will be able to process thousands of TOPS should be available in the next few years, say experts. Here, this technological leap will provide a huge step increase in performance. Beyond accountability, it will increase the scope of AI apps that can operate locally on personal devices.

As it stands, any laptop over a year old cannot run any commercially useful AI model simply because the hardware is obsolete. Small language models (SLMs) for smaller, local systems frequently cut features. In the latter case, they go so far as to leave out integral features entirely. However, as NPUs continue to grow in number and power, this unfortunate limitation may soon be a relic of the past.

“We must be good at low latency, at handling smaller data types, at branching code—traditional workloads. We can’t give that up, but we still want to be good at AI,” – Mike Clark

AMD has once again proven that it’s a company driven by innovation, not monopolies, with the savvy Ryzen AI Max. This new technology merges CPU cores, GPU cores, and an NPU together into one piece of silicon. This shared memory architecture is focused on improving performance and latency-related problems found in traditional architectures.

“By bringing it all under a single thermal head, the entire power envelope becomes something that we can manage,” – Mahesh Subramony

As companies continue refining their designs for system-on-a-chip (SoC) configurations, the potential for more effective AI integration in laptops becomes increasingly realistic.