Claude Sonnet 4.6: When Sonnet Finally Touches the Opus Ceiling

Anthropic’s Ambition This Time

On February 17, 2026, Anthropic released Claude Sonnet 4.6.

One-line summary: This is the most capable Sonnet model yet, with comprehensive upgrades across coding, long-context reasoning, agent planning, computer use, and more.

The key part? The price hasn’t changed. Still $3/$15 per million tokens.

Free and Pro users opening claude.ai get it as the default model.

What does this mean? Anthropic has taken capabilities that once required Opus and pushed them down to Sonnet. Regular users get near-flagship intelligence without paying a penny more.

What the Benchmarks Say

Sonnet 4.6 approaches or matches Opus across multiple core benchmarks.

SWE-bench Verified (software engineering) hit 80.2%, a score that only top-tier models could achieve last year.

ARC-AGI-2 (abstract reasoning) reached 60.4%, requiring high-effort thinking.

In Claude Code, users preferred Sonnet 4.6 roughly 70% of the time over the previous Sonnet 4.5.

Even more interesting: compared to last year’s Opus 4.5, users preferred Sonnet 4.6 59% of the time. A Sonnet-class model actually beating the previous-gen Opus in user experience.

Why? Because it’s less prone to overengineering, less lazy, fewer hallucinations, and fewer false claims of success. Put simply, it’s more reliable.

Coding: Beyond Just Good Benchmark Scores

Coding is the most obvious upgrade direction this time.

User feedback says Sonnet 4.6 reads context more carefully before modifying code, instead of jumping straight into changes. It also proactively consolidates duplicated logic rather than copy-pasting everywhere.

These sound like small changes, but anyone who’s used AI coding assistants knows these are exactly the most frustrating problems. A model that modifies code without reading context, or writes the same logic three times in three places, creates a maintenance nightmare.

Partner feedback tells the same story:

GitHub’s Joe Binder says it “excels at complex code fixes with strong resolution rates.” Cursor’s Michael Truell cites “notable improvement on difficult problems.” Cognition’s Scott Wu says it “meaningfully closed the gap with Opus on bug detection.”

Replit’s Michele Catasta puts it bluntly: “The performance-to-cost ratio is extraordinary.”

Computer Use: Operating a Computer Like a Human

This is an easy-to-overlook but critically important direction.

Many enterprise applications lack APIs and can’t be automated through code. Claude Sonnet 4.6’s solution is straightforward: operate the computer like a human, clicking with a mouse and typing with a keyboard.

Since first launching computer use in October 2024, Anthropic has iterated for 16 months. They use the OSWorld benchmark to measure progress, which requires AI to operate real software: Chrome, LibreOffice, VS Code, and more.

How does it perform now? On tasks like navigating complex spreadsheets and filling out multi-step web forms, it’s approaching human-level capability.

There’s still a gap compared to the most skilled human operators, but the rate of progress is remarkable.

Safety hasn’t been neglected either. Resistance to prompt injection attacks has been enhanced to match Opus 4.6 levels.

Million-Token Context Window

Sonnet 4.6 supports a 1 million token context window (in beta).

This means you can feed in an entire codebase, or dozens of research papers, all at once. The crucial point is that it reasons effectively across all that context, not just “seeing” the information but truly understanding and connecting it.

This matters especially for long-horizon agent tasks. In the Vending-Bench Arena test, Sonnet 4.6 demonstrated impressive long-term strategic ability: it invested heavily in capacity building for the first ten simulated months, then pivoted sharply to focus on profitability. This kind of strategic thinking doesn’t come from just stacking parameters.

Design Sense Leveled Up Too

An unexpected highlight: external partners reported that Sonnet 4.6’s visual outputs are notably more polished.

Better layouts, animations, and design sensibility. Triple Whale’s AJ Orbach even said it has “perfect design taste when building frontend pages.”

For teams that need to rapidly prototype, this improvement is highly practical.

Developer Platform Updates

Beyond the model itself, Anthropic rolled out a round of developer tool updates:

Adaptive thinking and extended thinking: the model automatically adjusts reasoning depth based on problem complexity
Context compaction: long conversations no longer easily blow up the context
Web search tool upgrade: automatically writes and executes code to filter search results
Code execution, memory, tool search and more features now generally available

Claude in Excel now supports MCP connectors for financial data services like S&P Global, LSEG, PitchBook, Moody’s, and FactSet. This is a major win for finance industry users.

My Take

The most noteworthy thing about this release isn’t any single benchmark score. It’s a trend: Sonnet is becoming the optimal choice for most people.

Opus remains the strongest reasoning model, ideal for hardcore tasks like deep codebase refactoring. But for the vast majority of daily use cases, Sonnet 4.6 is more than enough, and it’s faster and cheaper.

This is also Anthropic’s strategy: continuously push Opus capabilities down to Sonnet, letting more people access top-tier AI at lower cost.

For developers, the model ID is claude-sonnet-4-6, available across all Claude plans. Free users finally get access to file creation, connectors, skills, and context compaction.

This upgrade genuinely delivers.

Anthropic’s Ambition This Time#

What the Benchmarks Say#

Coding: Beyond Just Good Benchmark Scores#

Computer Use: Operating a Computer Like a Human#

Million-Token Context Window#

Design Sense Leveled Up Too#

Developer Platform Updates#

My Take#