Nvidia has unveiled its latest innovation, the Groq 3 LPU, a cutting-edge processor designed to enhance artificial intelligence inference capabilities. This announcement comes two and a half months after the reveal of the Rubin GPU, highlighting Nvidia’s commitment to advancing AI technology. The launch is focused largely on a historic and game-changing licensing agreement between Nvidia and Groq. Priced at $20 billion, this acquisition allows Nvidia to accumulate important intellectual property that Groq has developed into its world-class graphics processor.
Per Groq, the Groq 3 LPU is rated at a memory bandwidth of 150 terabytes per second. That’s a mindblowing seven times faster than the Rubin GPU, which processes at a mere 22 terabytes per second! While the Groq 3 LPU speed is stunning, it’s only the second most important ingredient to Nvidia’s recipe. Alongside inference disaggregation, it improves the overall inference disaggregation with the newly developed inference compute tray, the Nvidia Groq 3 LPX.
Each compute tray will contain eight Groq 3 LPUs and one Vera Rubin unit. This configuration is optimized for handling AI workloads more efficiently by separating inference tasks into two distinct parts: pre-fill and decode. The Vera Rubin is especially well-suited to take care of the pre-fill and computationally-intensive parts of the decoding process. At the same time, every Groq 3 LPU undertakes the last decoding step.
Nvidia’s angle on this is a clear focus on low latency token generation, a key parameter for many of these AI apps. Ian Buck, a key figure in the development of the Groq 3 LPU, highlighted its optimization for extreme low latency.
“The LPU is optimized strictly for that extreme low latency token generation,” – Ian Buck
The Groq 3 LPU incorporates a novel architectural design. By integrating SRAM memory directly within the processor, data flow is maximized, creating efficiency and enhancing performance. Mark Heaps, another well-known developer active in the project, described how this architecture fully supports all performance enabling features.
“The data actually flows directly through the SRAM,” – Mark Heaps
Nvidia recently accelerated the Groq 3 LPU into full production. This decision demonstrates its dedication to meeting the increasing demand for cutting-edge AI processing power. Jensen Huang, CEO of Nvidia, is quite excited about this new AI powered epoch. He feels that the technology has just matured enough to deliver such productive work.
“Finally, AI is able to do productive work, and therefore the inflection point of inference has arrived,” – Jensen Huang
By adding the Groq 3 LPU, Nvidia’s semiconductor industry dominance appears even more solidified. It further demonstrates the company’s commitment to leading the AI that innovates. Sid Sheth, an expert in SRAM-based architectures, added his perspective on the importance of this announcement.
“NVIDIA’s announcement validates the importance of SRAM-based architectures for large-scale inference, and no one has pushed SRAM density further than d-Matrix,” – Sid Sheth
Nvidia’s strategic acquisition of Groq’s technology not only strengthens its offerings but secures its place as a leader in AI innovation. The combination of the Groq 3 LPU with Vera Rubin underlines a significant shift in how companies can approach AI processing.

