In a study submitted for publication, researcher Brendan Foody recently introduced the first-ever famous AI (AI) agents on a new benchmark. This benchmark begs the question, how ready is AI for workplace deployment? The benchmark, called Apex-Agents, uniquely measures AI models’ capabilities to answer questions asked by actual practitioners from multiple professions. There’s little surprise in these findings, given that AI models are repeatedly failing to meet even the most basic standards. Even under our most favorable assumptions, every tested model scored poorly.
The Apex-Agents benchmark highlights a critical stumbling point for AI models: their capability to track down information across multiple domains. This challenge has led to some embarrassing failures for state-of-the-art AI systems. For example, Opus 4.5, Gemini 3 Pro, and GPT-5 only achieved around 18% on the benchmark. This outcome indicates that they failed to respond correctly to even one out of every four inquiries. Compared to that, Gemini 3 Flash performed a bit better at 1-shot accuracy of 24%, with GPT-5.2 hitting 23%.
Foody’s work comes at a fascinating and critical moment. According to featured thought leaders including Microsoft CEO Satya Nadella, AI is poised to change virtually every knowledge-based field. As Nadella points out, AI would be able to take over jobs in deep knowledge professions like law, investment banking, and accounting. The Apex-Agents benchmark confirms a dramatic and surprising finding. That should be a signal that AI technology doesn’t live up to these exceptionally high hopes.
“The way we do our jobs isn’t with one individual giving us all the context in one place,” Foody remarked, emphasizing the complexity of information retrieval in professional environments.
Foody hopes that AI labs will be spurred by these findings to do better and make their models more accurate. He’s seriously determined to deal with those weaknesses that the Apex-Agents benchmark shows you. This is a critical first step toward determining the future of workplace AI. “I think this is probably the most important topic in the economy,” he added, underscoring the urgency for advancements in AI capabilities.
At the same time, organizations are en masse trying to figure out how to responsibly implement AI technologies in their work. The equity and performance metrics identified in Apex-Agents call on developers and researchers to act. The benchmark is an initial step that seeks to assess state of the art in artificial intelligence capabilities today. It motivates creativity and pushes the profession forward.
This research is more than just technical specs. It casts a spotlight on seismic shifts in the macro economy, and how the role and responsibilities of the jobs executive can change in their wake. AI has a tremendous potential for increasing productivity and efficiency. These results illustrate the continued imperative for substantial improvements before it can truly meet its potential to aid professionals in their daily work.

