Following on Monday’s news from Anthropic about the release of Opus 4.5, the latest iteration of that company’s flagship AI model, this new iteration is a giant step in the right direction for the entire discipline. It’s the first model to pass 80 percent on SWE-Bench verified, a popular coding benchmark. Promising state-of-the-art cross modality and commonsense benchmarks performance, Opus 4.5 is ready to lead the future standards by blazing the trail of what AI can achieve.
Beyond improving UI coding performance through benchmarks like SWE-Bench and Terminal-bench, there are several other strong areas where Opus 4.5 thrives, making it a good coding choice. Further, it shows state-of-the-art performance in tool use capabilities with benchmarks such as tau2-bench and MCP Atlas. The model performs best overall on general problem solving benchmarks such as ARC-AGI 2 and GPQA Diamond, highlighting its versatility and effectiveness.
One of the biggest improvements in Opus 4.5 is its memory optimizations made specifically for long-context use cases. These advancements necessitated significant alterations in the model’s memory usage. Today, it’s more able to remember and reference pertinent facts.
Dianne Na Penn, Anthropic’s head of product management for research, discussed the advancements in Opus 4.5 during an interview with TechCrunch. In discussing performance, she highlighted the need for memory alongside context windows to truly perform.
“There are improvements we made on general long context quality in training with Opus 4.5, but context windows are not going to be sufficient by themselves.” – Dianne Na Penn
As Penn pointed out, it’s important to understand what information you need to store for best performance.
“Knowing the right details to remember is really important in complement to just having a longer context window.” – Dianne Na Penn
She stressed that more basic components such as memory are key to extending Opus 4.5’s capabilities.
“This is where fundamentals like memory become really important.” – Dianne Na Penn
While Opus 4.5 is currently undergoing technical improvements, the primary focus is on agentic use-cases. It will function as a lead agent, efficiently coordinating a team of Haiku-empowered sub-agents. This new method is intended to increase the possible uses of the model across industries.
Anthropic plans to widen access to Claude for Chrome and Claude for Excel extensions, too. This launch will align with the release of Opus 4.5. We believe these additions will hugely improve experience and accessibility for users across platforms.
TechCrunch Disrupt is the official launchpad for Opus 4.5, happening in San Francisco on October 13-15, 2026. This convening will be a great opportunity to learn more about this innovative model.

