We hope you’ll join us for TC Sessions: Mobility 2025 in San Francisco from October 27–October 29, 2025. This convening will bring together some of the brightest industry leaders, researchers, and innovators. Together, they’ll address the most urgent matters in artificial intelligence – focusing especially on the troubling trend of AI hallucinations. Recent studies have shed light on how language models, including popular chatbots, can generate inaccurate information, raising questions about the reliability of AI outputs.
In a recent investigation, researchers discovered that a widely used chatbot provided three different, incorrect answers concerning the title of Adam Tauman Kalai’s Ph.D. dissertation. This real-world inconsistency underscores the challenges AI systems are up against. Violations OpenAI themselves list their most common abuse cases, which include the generation of “plausible but false claims.” These inaccuracies expose a critical issue with LLMs. Instead, they frequently peddle pure speculation, hype, or wishful thinking as truth.
The researchers compared these evaluations to standardized multiple-choice tests, for which students can get correct answers by purely guessing. They emphasized that if “the main scoreboards keep rewarding lucky guesses,” models will continue to learn to guess rather than provide accurate responses. This realization has huge implications. It challenges us to reconsider the ways that we test and train AI systems.
With these fascinating topics, Adam Tauman Kalai certainly brings a lot of experience to the discussion. Liu previously worked as a tech blogger at Adweek and as a senior editor at Venturebeat. Kalai previously worked as a local government reporter at the Hollister Free Lance. Prior to that, she worked at a venture capital firm as their vp of content. Given his background at the intersection of journalism and technology, he’s especially well positioned to provide critical insights at next month’s TechCrunch event.
Anthony Ha, TechCrunch’s weekend editor based in New York City, expressed anticipation for the discussions that will unfold at the event. The tech community has rightly sounded alarms on this issue as the inaccuracy and misinformation AI creates is spreading rapidly. This sudden fixation on AI hallucinations whitewashes those larger concerns.
Many researchers have noticed the same things happening with spelling and parentheses errors in AI outputs. They observed that these problems have a way of disappearing at scale. The more data that language models train on, the more mistakes they’re able to avoid. Problems are still erroneous.
The researchers argue that creating and implementing new standards would require a complete rethinking of how AI models are judged. Further, they claim that current accuracy-based evaluations must be updated to penalize guessing. In particular, they argue that models should be penalized more for confident incorrect answers than for uncertain predictions. Finally, they argue that models should be given partial credit for being able to correctly express uncertainty.