Anthropic's Claude Sonnet 5 Is "Near-Opus Intelligence" For All Plans

Anthropic launched Claude Sonnet 5, the most recent Sonnet-class mannequin. Though it’s not a frontier-model breakthrough, Sonnet 5 meaningfully upgrades efficiency over earlier fashions to ship stronger coding capabilities, higher agentic efficiency, and extra environment friendly token utilization.

Anthropic’s announcement emphasised agentic efficiency, particularly the mannequin’s means to hold out multi-step work with much less direct human steering. Anthropic says Sonnet 5 could make plans, use instruments resembling browsers and terminals, and function autonomously at a stage that just lately required bigger, dearer fashions.

Sonnet 5 Is Extra Economical With Tokens

Anthropic reveals that Sonnet 5 improves over 4.6 with lower-price choices and better high quality. Opus 4.8 nonetheless beats Sonnet 5 for accuracy, however Anthropic says that the trouble stage will be adjusted to seek out the perfect steadiness between value and efficiency. There’s additionally an introductory worth for Sonnet 5 of $2/MTok enter and $10/MTok output by August 31.

Sonnet 5 Efficiency Benchmarks

Sonnet 5 beats Sonnet 4.6, GPT-5.5 and Gemini 3.5 Flash throughout a variety of benchmarks.

The BrowseComp checks how nicely an AI agent can find troublesome to seek out data on the internet.

BrowseComp scores:

Claude Sonnet 5: 84.7 (single agent)
Claude Sonnet 4.6: 76.2
GPT-5.5: 84.4

Terminal-Bench 2.1 is a take a look at of an AI mannequin’s means with coding duties in terminal and CLI.

Terminal-Bench 2.1 scores:

Claude Sonnet 5: 80.4
Claude Sonnet 4.6: 67.0
GPT-5.5: 83.4 (Codex CLI)
Gemini 3.5: Flash 76.2

SWE-bench Professional is a software program engineering benchmark during which Sonnet 5 outperformed different comparable LLMs.

SWE-bench Professional scores:

Claude Sonnet 5: 63.2
Claude Sonnet 4.6: 58.1
GPT-5.5: 58.6
Gemini 3.5 Flash: 55.1

FrontierCode is a benchmark for agentic coding throughout 150 duties, a benchmark that Sonnet 5 considerably outperformed GPT-5.5.

The Claude Sonnet 5 System Card explains:

“Every activity offers the agent a checked-out repository and a single concern description; the agent then works autonomously in a containerized surroundings to supply a remaining patch, with no human intervention and no timeout data.
Patches are graded in opposition to blocking practical standards (primarily held-out unit checks) plus weighted rubric standards, together with model-graded checks for required take a look at protection and prohibited implementation patterns. Duties have been authored by maintainers of the underlying repositories and individually reviewed by Cognition researchers, with a random subset manually solved toverify equity.”

The FrontierCode scores:

Claude Sonnet 5: 38.8
Claude Sonnet 4.6: 15.1
GPT-5.5: 25.5

Sonnet 5 Is “Close to-Opus Intelligence”

Anthropic doesn’t declare that Sonnet 5 is a frontier mannequin breakthrough, though it does say that it’s their most succesful Sonnet-class mannequin. The system card explains that it’s much less succesful than Anthropic’s extra succesful Opus and Mythos fashions. But Anthropic does declare that it’s “near-Opus intelligence at Sonnet pricing for coding, brokers, and on a regular basis skilled work.”

Learn the total announcement at Anthropic.

Featured Picture by Shutterstock/jackpress

#Anthropics #Claude #Sonnet #NearOpus #Intelligence #Plans

Sonnet 5 Is Extra Economical With Tokens

Sonnet 5 Efficiency Benchmarks

Sonnet 5 Is “Close to-Opus Intelligence”

SocialSignalCounter

Leave a Reply Cancel reply

Login