This week the Gemini team at Google has apologized. After being inundated with complaints about the erratic pricing of their AI models, they are now on record promising to amend their ways. The Coalition’s technical assistance team is sensitive to the financial challenges developers face. They criticized these shortcomings in particular with the release of the Gemini 2.5 Pro, the company’s most expensive generative AI model to date.
The costs of using frontier models, like Gemini, have been increasing. In recent months, developers have increasingly raised their voices in frustration over surprise charges, leading to an announcement from Google. Logan Kilpatrick breaks the news on a recent and very cool development. To help reduce these costs, the company has introduced implicit caching in the Gemini API.
“We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache,” – Logan Kilpatrick
Implicit caching makes it possible to reuse frequently accessed and pre-computed data from models, enabling significant reductions in computing needs and costs. This characteristic is particularly timely for the AI sector, as developing AI solutions becomes ever more costly and developers look for methods to curtail rising operational costs.
To allow implicit caching, the Gemini 2.5 Flash model requires a minimum of 1,024 prompt tokens. The pro version of the more sophisticated Gemini 2.5 model has a minimum of 2,048 tokens. NOTE TO OUR READERS Tokens are the chunks of data that models compute on, about 1,000 tokens being roughly equal to 750 words.
To maximize the benefits of implicit caching, Google recommends that developers maintain repetitive context at the beginning of their requests. This strategy makes cache hits more likely, driving even greater cost efficiency.
Complaints regarding pricing have gained traction on social media, with Logan Kilpatrick addressing concerns directly on Twitter on May 8, 2025. This isn’t a new concern — many developers have warned that using OpenAI’s O3 model could be more expensive than anticipated. They have the cost capped at an exorbitant $30,000 per task.
In a blog post, Google went on to explain that the new implicit caching feature is directly compatible with their own Gemini 2.5 models. The company’s developer documentation offers in-depth information on caching strategies for the Gemini API.
As developers continue to navigate challenges in AI development costs, Google’s proactive approach in addressing feedback and implementing cost-saving measures may help ease some of these concerns.