in

Anthropic just released Claude Opus 4.5 – here’s how it stacks up against other leading models

Photo Illustration by Thomas Fuller/SOPA Images/LightRocket via Getty Images

Follow ZDNET: Add us as a preferred source<!–> on Google.


ZDNET’s key takeaways

  • Anthropic’s new AI model, Claude Opus 4.5, has arrived.
  • The model reportedly excels at creative problem-solving.
  • It also excels at agentic tasks, according to Anthropic.

AI startup Anthropic released its latest model, Claude Opus 4.5, on Monday, describing it in a company blog post as “a step forward in what AI systems can do, and a preview of changes to how work gets done.”

Also: ChatGPT’s new shopping research tool is fast, fun, and free – but can it out-shop me?

The new model outperforms other industry-leading apps such as Google’s Gemini 3 Pro and OpenAI’s GPT-5.1 on coding tasks, according to Anthropic. 

–> <!–>
Anthropic

The company also wrote that the model “scored higher than any human candidate ever” on the “notoriously difficult” exam given to prospective engineering employees. The result “raises questions about how AI will change engineering as a profession,” Anthropic wrote in its blog post. A version of Gemini 2.5 also recently scored top marks in the International Collegiate Programming Contest (ICPC), an internationally renowned coding competition.

Claude Opus 4.5 outperforms previous Anthropic models in vision, reasoning, and math, according to the company, and achieves state-of-the-art performance in tasks such as agentic tool use and computer use.

Anthropic added in its blog post that its latest model reached new heights in its ability to reason through and flexibly adapt to complex problems.

Also: Anthropic’s new warning: If you train AI to cheat, it’ll hack and sabotage too

In one test scenario, the model had to serve as an automated airline agent helping a customer who had requested to change their basic economy flight. Since such a change isn’t allowed by the fictitious airline, the test is designed to measure how well the automated agent denies the request and handles the disgruntled customer. Claude Opus 4.5, however, found a creative loophole: It first changed the customer’s cabin, then changed their flight, since such a change was allowed for non-basic economy flights.

“This would cost more money, but it’s a legitimate path within the policy,” Claude Opus 4.5 said during the transaction, according to an image provided by Anthropic in the new blog post.

“The benchmark technically scored this as a failure because Claude’s way of helping the customer was unanticipated,” Anthropic wrote. “But this kind of creative problem solving is exactly what we’ve heard about from our testers and customers – it’s what makes Claude Opus 4.5 feel like a meaningful step forward.”

Claude Opus 4.5 scored better than its predecessors and other frontier models on exhibiting “concerning behavior,” which Anthropic defines as “both cooperation with human misuse and undesirable actions that the model takes at its own initiative.”

Also: Claude can integrate with Excel now – and gets 7 new connectors

Available now on the Claude apps, API, and through the three major cloud platforms (Azure, Amazon Web Services, and Google Cloud). Anthropic says it’s priced accessibly, and we’ve reached out to the company for more pricing details.

Anthropic reported a $183 billion valuation in September following its latest funding round, a figure largely made possible by Claude’s popularity among enterprise customers. The company also announced earlier this month that it would invest $50 billion in its own data centers across the US to power the training of new AI models.

These 10 Amazon Black Friday deals are so good they’re downright steals (and all under $25)

Your home Wi-Fi isn’t nearly as private as it should be – 6 free methods to tighten its security