This week, Anthropic released Opus 4.6. This mission-critical update has significantly improved the ability of AI agents to excel in complex professional domains. So Opus 4.6 has definitely created a buzz in the technology community. It has completely reshaped the leaderboard on Mercor’s leaderboard, which measures AI capabilities in short-form across dozens of domains.
One-shot performance Opus 4.6 performed extraordinarily well in one-shot trials, managing to score nearly 30%. When allowed more than one shot, the model’s score jumped to a remarkable mean of 45%. These results are a dramatic turnaround from historic results. Until now, no major lab could quite manage to go above the 25% mark.
Brendan Foody, CEO of Mercor, remarked on his surprise at the performance jump. He highlighted the rapid advancements in AI capabilities, stating, “jumping from 18.4% to 29.8% in a few months is insane.” The development of AI has climbed an exponential curve at breathtaking speed. Perhaps no more so than in the professional fields like law and corporate analysis.
The new benchmark established by Mercor specifically measures AI agents’ abilities to handle intricate professional tasks, a crucial factor in determining their viability for real-world applications. Long leads, high scores Before Opus 4.6 hit the streets, the scores were abysmal. This left millions of dollars on the table and created a huge opportunity for improvement.
The impact of Opus 4.6 goes way beyond dollars and cents. Together they represent an impressive breakthrough in the abilities of AI agents. As these models get better and better at passing tests and faking competitions, the race for AI technologies is a closely contested one.
Mercor’s leaderboard has been instrumental in shining a spotlight on these advances, helping industry watchers and stakeholders alike understand the remarkable progress being made. With every new release–like Opus 4.6, that loop the-looping ride on AI’s rollercoaster–the landscape grows more exciting and more complex.

