Nvidia has now introduced its Groq 3 LPU, a revolutionary artificial intelligence (AI) technology that combines several state-of-the-art capabilities. Nvidia has cleverly launched a new graphics processing unit. This is impressive on the heels of making their Vera Rubin GPU available to the world only two and a half months ago. Its original company licensed intellectual property from Groq for the jaw dropping sum of US $20 billion. This surprising but strategic acquisition signals its intent to strengthen its product ecosystem in the increasingly crowded AI space.
The Groq 3 LPU has a staggering 150 terabytes per second (TB/s) memory bandwidth. This impressive efficiency counts it among the best for handling the myriad of custom layers that come with the final layers of inference. The LPU’s structure and design really play to its strengths. This design makes it a perfect fit for the rapid data processing tasks that are central to AI applications. Each Nvidia Groq 3 LPX compute tray contains eight Groq 3 LPUs. Each of these LPUs is coupled with a Vera Rubin GPU that comes loaded with 288 gigabytes of high-bandwidth memory (HBM) and provides a breathtaking 50 petaFLOPS of processing power in 4-bit calculation.
This pairing of the Groq 3 LPU and the Vera Rubin GPU inverts Nvidia’s bone-deep dedication to making inference disaggregation a thing. Our groundbreaking technology partitions the multiple operations that make up AI processing into prefill and decode phases, maximizing performance and efficiency. The Rubin GPU’s memory bandwidth of 22 TB/s pales in comparison to the Groq 3 LPU’s, which is seven times faster, allowing for more streamlined data handling.
12, and the details of Groq’s collaboration with Nvidia were finalized in a non-exclusive inference technology licensing agreement signed on Christmas Eve. This continuing partnership further enables Nvidia to supplement cutting-edge inference capabilities while still enjoying operational flexibility.
Mark Heaps, one of the primary architects behind the Groq 3 LPU, discussed the effectiveness of their streamlined architectural choices. He stated, “When you look at a multi-core GPU, a lot of the instruction commands need to be sent off the chip, to get into memory and then come back in. We don’t have that. It all passes through in a linear order.” This linear processing capability makes latency extremely low, and turnaround times for AI tasks extremely fast.
In discussing the potential impact of this new technology, Ian Buck noted that “the LPU is optimized strictly for that extreme low latency token generation,” emphasizing its role in producing quick and accurate results essential for modern AI applications.
Nvidia CEO Jensen Huang expressed his enthusiasm for the Groq 3 LPU’s capabilities, declaring, “Finally, AI is able to do productive work, and therefore the inflection point of inference has arrived.” This clarification underscores just how important inference technology is to expanding AI’s reach. It could fundamentally change how AI systems are designed and used across sectors.
The seamless incorporation of SRAM memory directly onto the Groq 3 LPU™ itself further optimizes the device for performance. This technical decision increases the speed of accessing, generating, ingesting, and processing data. It is an appropriate environment for tackling intensive computational tasks.
With this launch, Nvidia hopes to get out in front of AI inference technology and stay there. Using the Groq 3 LPU and Vera Rubin GPU pair together highlights a pioneering approach to address increasing processing needs. As AI further integrates into our economy, these technologies are more than equal to the task.

