Yet Qualcomm has done something remarkable in the course of computing. Now they’ve gone and become the first to introduce a Neural Processing Unit (NPU) in Windows laptops. We know the tech landscape is changing incredibly fast. Competitors such as AMD & Intel are in the race too, showcasing NPUs that provide phenomenal performance, boasting up to 40-50 trillion operations per second (TOPS). In 2023, AMD’s chips with NPUs are still few and far between, providing at most about 10 TOPS. Well, Nvidia is taking things even further. Its GeForce RTX 5090 goes even further offering up to 3352 TOPS of AI performance.
>Microsoft’s new Copilot+ features are largely powered by Qualcomm’s Snapdragon X chip, equipped with a high-performance NPU as a central component. AMD’s Ryzen AI Max takes the company’s powerful CPU cores and mashes them together with Radeon GPU cores. It includes a new neural processing unit (NPU) that claims 50 TOPS performance. Many laptops already adopt this technology, like the HP Zbook Ultra G1a and the Asus ROG Flow Z13. That’s why it’s creating such a stir in the tech sector! Looking beyond today, the industry is expecting NPUs that process thousands of TOPS to be available in the very near future.
The Evolution of Computing with NPUs
The adoption of NPUs represents a fundamental change in the way PCs will process AI workloads. Historically, GPUs have been the hardware of choice for their ability to rapidly execute massively parallelized tasks. This power to process petabytes in parallel is what made GPUs the de facto building blocks of AI data centers.
Steven Bathiche, a notable figure in the field, explains, “With the NPU, the entire structure is really designed around the data type of tensors [a multidimensional array of numbers].” This kind of specialization is what enables NPUs to process workloads with far greater efficiency and speed than CPUs or even other accelerators.
He further elaborates on the advantages of NPUs: “NPUs are much more specialized for that workload. And so we go from a CPU that can handle three [trillion] operations per second (TOPS), to an NPU.” Transitioning to this new paradigm further highlights the need for hardware that can stay ahead of the growing requirements of AI workloads.
Challenges and Opportunities in Chip Design
Even with all the potential NPUs bring to the table, experts warn it is a mistake to depend on them exclusively for AI acceleration. Mike Clark emphasizes the importance of balancing traditional computing needs with AI capabilities: “We must be good at low latency, at handling smaller data types, at branching code—traditional workloads. We can’t give that up, but we still want to be good at AI.”
Joe Macri highlights another challenge within modern PCs: “When I have a discrete GPU, I have a separate memory subsystem hanging off it.” This introduction of physical separation creates significant challenges for data sharing between the CPU and GPU. He explains how data transfer occurs: “When I want to share data between our [CPU] and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back.”
This fragmented architecture can hinder performance. This means that chipmakers need to be smart about how they design chips to make them more efficient, but still support both legacy workloads and AI.
The Future of AI in Laptops
The future landscape of personal computing seems ripe for a revolution led by the rapid development of NPU technology. Qualcomm’s Snapdragon X chip is an excellent illustration of this trend at work. With its on-device NPU, it allows PCs to run AI functions locally. This capability gives their users the ability to access sophisticated AI models without hitting the cloud dependency bottleneck.
Vinesh Sukumar shares his vision for this future: “I want a complete artificial general intelligence running on Qualcomm devices.” He continues, “That’s what we’re trying to push for.” His goal is to produce technology that can make its way more easily into our daily devices and lives.
>Mahesh Subramony explores what goes into making smart design decisions that ensure successful system-on-chip (SoC) solutions. He notes, “We have to be very deliberate in how we design our [system-on-a-chip] to ensure that a larger [SoC] can perform to our requirements in a thin and light form factor.” Finding that balance will be key, as laptops are getting thinner and thinner and requiring more performance than ever.
Subramony highlights thermal management as a key consideration: “By bringing it all under a single thermal head, the entire power envelope becomes something that we can manage.” Overall, this tunable and fine-grained approach is key to getting the best possible performance given the constraints of portability.

