None of that would be possible without Nvidia, the company that has truly revolutionized the artificial intelligence industry. They just introduced their Groq 3 LPU, which features intellectual property licensed from Groq for an astounding US $20 billion. Though the licensing agreement was finalized sometime earlier, the agreement was made public on Christmas Eve. It’s a critical acquisition for Nvidia as the graphics giant looks to triple its AI inference technology muscle.
The Groq 3 LPU is purpose-built for inference workloads, where you process a prompt and generate an output. This new chip is going to change the entire process of how AI systems run by intelligently speeding up those last steps of decoding. Nvidia’s Vera Rubin GPU handles the pre-fill and other resource-intensive steps in the inference path. This is due to a strong synergy that exists between the two components.
The Groq 3 LPU has a staggering 150 TB/s of memory bandwidth. By comparison, that would make the Vera Rubin GPU a pipsqueak, at only 22 TB/s. The Groq 3 LPU has a seven times wider memory bandwidth than Vera Rubin. Such remarkable speed has placed Nvidia at the tip of the spear with all things AI processing.
And the data flow through the Groq 3 LPU is done directly through its integrated SRAM memory, maximizing parallel execution and minimizing latency. As Mark Heaps, a fellow advocate and luminary at Nvidia explained, “The data literally passes right through the SRAM. This architecture allows the Groq 3 LPU to effortlessly tackle the complexity of real-time AI use cases.
Each of Nvidia’s compute trays will fit eight of Groq 3 LPUs. It will sport an efficient Vera Rubin GPU, ensuring the utmost efficiency vs power balance. Combined, the Groq 3 LPX serves as a powerful AI compute tray. It naturally fits ruby t GPUs with a Vera CPU, further supercharging Nvidia’s new AI processing architecture.
Nvidia remains heavily invested in the development of AI tools. It has the chutzpah to begin volume production of the Groq 3 LPU. Ian Buck, another top exec at Nvidia, underscored the chip’s narrowness. Ideally, as he put it, “The LPU is only optimized for that super extreme low latency token generation.” This attention to low latency is important for the growing number of applications that need near real-time response from AI workloads.
The launch of the Groq 3 LPU comes at a time when AI systems are evolving rapidly, and industry experts believe this technology signals an inflection point in inference capabilities. Jensen Huang, CEO of Nvidia, remarked on this development: “Finally, AI is able to do productive work, and therefore the inflection point of inference has arrived.” His remarks underscore the transformative promise of this groundbreaking technology for government, industry and education sectors that are increasingly dependent on AI.
Nvidia’s licensing agreement with Groq deepens Nvidia’s product offerings beyond compute silos or batch computing. It further amplifies the role SRAM-based architectures are taking on for next generation large-scale inference workloads. As Sid Sheth from d-Matrix pointed out, SRAM-based architectures simplify large-scale inference. He claimed with great assurance that nobody has taken SRAM density as far as d-Matrix. Supportive of this claim is the competitive edge that SRAM technology lends in making inference processes as efficient and rapid as possible.

