Anthropic's Claude 4 AI models excel in multi-step reasoning
Anthropic launches advanced AI models Claude Opus 4 and Sonnet 4 with top benchmark scores.

Anthropic rolled out two significant additions to their AI offerings, the Claude 4 family, during a developer conference. The standout models, Claude Opus 4 and Claude Sonnet 4, are designed to excel in multi-step reasoning, enabling them to process large datasets, engage in complex multi-step tasks, and address intricate programming challenges with ease. According to Anthropic, these models are tuned to excel in coding tasks, making them highly suited for tasks involving intricate codewriting and editing.
Only paying users have access to Claude Opus 4, while Claude Sonnet 4 will also be offered to users of the free chatbot apps. Anthropic has set the pricing for Opus 4 at $15 per million input tokens and $75 per million output tokens, while Sonnet 4 is priced lower, at $3 per million input tokens and $15 per million output tokens. A token, as used in this context, represents around 750,000 words, extending the efficiency in processing substantial linguistic data.
Claude Opus 4, the more capable model, is advertised to sustain focused effort over numerous workflow steps, enhancing its utility in prolonged tasks. Meanwhile, Sonnet 4 is a direct improvement over its predecessor, Sonnet 3.7, especially in following instructions and handling programming and mathematics-related queries. Anthropic has provided assurance that the Claude 4 models are less prone to reward hacking or specification gaming, indicating improved reliability and precision in task execution.
Despite these advancements, the models are not considered the absolute best across all benchmarks, though Opus 4 outperforms competitors such as Google's Gemini 2.5 Pro and OpenAI's O3 on SWE-bench Verified, an evaluation designed to gauge coding efficacy. However, Opus 4 doesn't surpass O3 in other areas like multimodal evaluation MMMU or GPQA Diamond, which encompasses complex academic queries spanning physics, biology, and chemistry.
Anthropic's endeavors are buoyed by strategic financial maneuvers, including securing a $2.5 billion credit facility and significant investments from Amazon and other benefactors. This financial influx is intended to support the increasing costs of developing cutting-edge AI models. Moreover, Claude Code, part of the Claude family, integrates smoothly with numerous development environments, including IDEs and GitHub, reflecting Anthropic's emphasis on catering to developers’ needs by enhancing model capabilities and integration strategies.
Sources: TechCrunch, Reuters, CNBC