Now, Google is facing immense criticism for its safety reporting practices related to the development of its artificial intelligence models. The ability to flexibly go beyond safety testing standards in AI creation and enforcement. Advocates say that the new reports are lacking in transparency. The controversy intensified following the announcement of Google’s Gemini 2.5 Flash model, which has not yet been accompanied by a safety report, raising questions about the effectiveness of Google’s internal safety evaluations.
Two years ago, Google informed the U.S. government that it would publish safety reports for all “significant” public AI models “within scope.” The company just published this technical report showing their internal safety evaluations on their Gemini 2.5 Pro model. Surprisingly, the report doesn’t even mention its Frontier Safety Framework (FSF). This framework, which was released last year, is meant to help determine future AI capabilities that are likely to lead to substantial and unjustified harm.
Experts agree that Google’s safety reporting strategy is different from its competitors. Though the company does publish technical reports, only after a model has graduated from the “experimental” stage. By comparison, some other labs have a tendency to push through their testing. Kevin Bankston is a senior adviser on AI governance at the Center for Democracy and Technology. He was right to raise alarms about this dangerous trend.
“Combined with reports that competing labs like OpenAI have shaved their safety testing time before release from months to days, this meager documentation for Google’s top AI model tells a troubling story of a race to the bottom on AI safety and transparency as companies rush their models to market.” – Kevin Bankston
Google touts that it conducts safety testing and “adversarial red teaming” on its models prior to their public release. Critics, like us, believe that none of these claims stand without the release of timely and robust reports. Peter Wildeford, another member of the AI community, commented on the inadequacy of Google’s new report.
“This [report] is very sparse, contains minimal information, and came out weeks after the model was already made available to the public,” – Peter Wildeford
Wildeford stressed how hard it will be to actually track if Google is living up to their promises on safety. It’s absurd that we have no way to check if Google is actually honoring its public promises. As a result, he claims that it’s equally impossible to judge the safety and security of their models.
A spokesperson for Google did try to respond to the criticisms levied in the report. They promised TechCrunch that a report to Gemini 2.5 Flash is “coming soon.” This promise does not address and frankly, it dodges alarming calls made by AI safety authorities that transparency is the key to AI safety.
Thomas Woodside, a third party observer to the field, told us he is optimistic that Google will improve their reporting as they continue through the field.
“I hope this is a promise from Google to start publishing more frequent updates,” – Thomas Woodside
He kept at it saying these updates need to add assessments of models not yet publicly deployed. These models have the potential to introduce significant harm.
As of the time of writing, the last published result of Google’s dangerous capability tests was from June 2024. This release came after a prototype announcement in February of that same year. Such a long gap in reporting makes the company’s commitment to transparency regarding AI safety evaluations all the more questionable.


