Benchmark

Opus 4.6 vs Codex 5.3: We've Officially Entered the 'Benchmarks Don't Matter' Era

Opus 4.6 and Codex 5.3 launched on the same day, but this time nobody’s looking at benchmarks. The AI coding assistant race has shifted from ‘who’s stronger’ to ‘who’s more usable’ — we’ve officially entered an era where system experience matters more than model capability.

Tencent's New Test Reveals: Even the Strongest AI Scores Only 23%

Tencent Hunyuan’s CL-bench reveals that even GPT-5.1 can only solve 23.7% of context learning tasks — exposing AI’s massive blind spot in ’learning on the fly.'