The Rise of NPUs in Personal Computing

In just the past few years, personal computing has changed completely. This revolution has been mostly fueled by the rise of Neural Processing Units (NPUs). These chips are meant to accelerate artificial intelligence processes on desktop computers. However, as companies like Qualcomm, AMD, and Intel compete to increase performance, their power only continues to increase….

Tina Reynolds Avatar

By

The Rise of NPUs in Personal Computing

In just the past few years, personal computing has changed completely. This revolution has been mostly fueled by the rise of Neural Processing Units (NPUs). These chips are meant to accelerate artificial intelligence processes on desktop computers. However, as companies like Qualcomm, AMD, and Intel compete to increase performance, their power only continues to increase. Qualcomm’s Snapdragon X chip is now set to power Microsoft’s Copilot+ features, demonstrating the growing integration of AI capabilities in everyday applications.

Continued drivers for faster, more efficient NPUs have spurred the state-of-the-art upward trend in NPU speed-up. In 2023, this was relatively uncommon for AMD chips with NPUs. The ones that were reached approximately 10 TOPS (trillion operations per second). Future inference workloads have the industry predicting NPUs to scale into the thousands of TOPS in just a few short years. This would be a 128-million times increase in processing power.

>AMD’s Ryzen AI Max has quickly become the third wheel in this contest, with its massive NPU rated at 50 TOPS. At this performance level they’re highly competitive with Intel and AMD’s products. All of these companies offer comparable capabilities to each other, offering solutions from 40-50 TOPS at the low-end. In contrast, Nvidia’s GeForce RTX 5090 showcases an impressive AI performance of up to 3,352 TOPS, highlighting the competitive nature of this technology sector.

The Role of NPUs in AI Acceleration

NPUs, or neural processing units, are purpose-built chips that handle AI-related tasks. Their architecture is purpose-built to move data types that are critical to AI workloads. This extends to tensors, or multi-dimensional arrays that are fundamental to deep learning. Steven Bathiche, a key figure in the development of these technologies, emphasizes the advantages of NPUs:

“With the NPU, the entire structure is really designed around the data type of tensors.” – Steven Bathiche

NPUs offer manufacturers a strong departure point to create a positive change. These specialized chips, which go beyond conventional CPUs, that only do a few trillion operations a second, supercharge computational prowess. This transition represents a paradigm shift in how we accomplish computing workloads. That’s just the beginning — as AI continues to permeate our daily lives, we’ll see even greater impact.

One major caveat Though NPUs are very promising, they must not be seen as panacea for AI acceleration. As such, traditional processors can no longer be the sole workhorse for all workloads. Mike Clark points out the necessity of maintaining efficiency across all processing units:

“We must be good at low latency, at handling smaller data types, at branching code—traditional workloads. We can’t give that up, but we still want to be good at AI.” – Mike Clark

This expectation of equity of performance between CPUs and GPUs really highlights how wild the landscape of compute architecture has grown.

Advancements in Unified Memory Architecture

Another challenge NPUs are running into is the memory architecture found in virtually all modern PCs. Since these systems usually utilize a separate memory architecture, it’s one of the areas that can drag down the performance of NPUs. AMD’s Ryzen AI Max goes a step further, employing a unified memory architecture. This new architecture integrates CPU cores, GPU cores, and an NPU all on a single silicon die. This innovation makes it easier to share data between components and improves performance as a whole.

By removing these bottlenecks with unified memory architecture, manufacturers are able to greatly streamline the workflow of processing tasks. As Mahesh Subramony notes:

“When I want to share data between our [CPU] and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back.” – Joe Macri

>This approach enables advanced power efficiency and thermal management which are critical in today’s power-sensitive and space-constrained computing devices.

“By bringing it all under a single thermal head, the entire power envelope becomes something that we can manage.” – Mahesh Subramony

The race to achieve the best NPU performance has increased as vendors compete to provide the most robust solutions. The rivalry between AMD, Intel, and Qualcomm is supercharging innovation and pushing the envelope on NPU capabilities. As these larger companies start to create new architectures and technologies, they look to create better offerings and take down competition and start up markets.

The Competitive Landscape of NPU Development

Vinesh Sukumar articulates the vision for Qualcomm’s future in AI processing:

These ambitions certainly signal the industry’s drive to develop transformative new AI capabilities that will reshape experiences on smartphones, computers and beyond.

“I want a complete artificial general intelligence running on Qualcomm devices.” – Vinesh Sukumar

As NPUs grow more important to personal computing, their influence will reach beyond raw performance numbers. Faster NPUs will allow systems to process more tokens per second, allowing users to have more productive experiences when using AI models. To make full use of the performance potential, companies are focusing on maximizing efficiency and throughput across multiple processors.

As NPUs become increasingly central to personal computing, their impact will extend beyond just performance metrics. Faster NPUs will enable systems to handle more tokens per second, enhancing user experiences when interacting with AI models. Companies are prioritizing efficiency and workload management across various processors to maximize performance potential.

Steven Bathiche concludes this discussion on future improvements:

“It’s about being smart. It’s about using all the [processors] at hand, being efficient, and prioritizing workloads across the CPU, the NPU, and so on. There’s a lot of opportunity and runway to improve.” – Steven Bathiche