Claude Sonnet 4
Claude Sonnet 4 supports a context window of 1M tokens on Vercel AI Gateway, enabling full codebase analysis of 75,000+ lines or large document sets, while scoring 72.7% on SWE-bench Verified with hybrid extended thinking and enhanced steerability.
import { streamText } from 'ai'
const result = streamText({ model: 'anthropic/claude-sonnet-4', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
- Configuration: The context window of 1M tokens significantly increases per-request token volumes. Monitor cost per request carefully when processing full codebases or large document collections, as a single request can consume tokens equivalent to many standard calls.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Claude Sonnet 4
Best For
- Full-codebase analysis and understanding: With the context window of 1M tokens, pass an entire large repository in a single request
- Agentic coding at scale: SWE-bench 72.7% and GitHub Copilot's selection as its coding agent model speak directly to this use case
- Multi-file refactoring and architectural reasoning: The model needs to hold the whole picture simultaneously
- Instruction-precise applications: Steerability matters, the model was specifically improved to follow complex, nuanced instructions more accurately
- Agent workflows at Sonnet pricing: The 5x cost difference from Opus 4 makes Sonnet 4 the right choice when the benchmark results are comparable
Consider Alternatives When
- Provider flexibility: Check whether later model versions support 1M tokens through all providers when you don't want to pin to Anthropic
- Sonnet 4.5 improvements: OSWorld computer use performance, 30+ hour agentic task duration, and domain-specific reasoning advances
- Haiku 4.5 capability match: Lower-cost option when Haiku 4.5 covers the capability requirements
- Opus-level reasoning depth: Sonnet benchmarks don't capture the full difficulty of some problems
Conclusion
Claude Sonnet 4 pairs strong coding benchmark performance with a context window of 1M tokens that makes entire codebases processable in one shot, a combination that changes what's architecturally feasible for software engineering agents. At Sonnet pricing, it's the default choice for teams building on the Claude 4 generation until their workloads specifically require Opus or the later Sonnet improvements.
Frequently Asked Questions
How do I enable the context window of 1M tokens for Claude Sonnet 4 on AI Gateway?
Add the
anthropic-beta: context-1m-2025-08-07header to your request. UnderproviderOptions.gateway, setonlyto['anthropic']so the request routes through the Anthropic provider, which supports the feature.What does the context window of 1M tokens enable in practice?
The context window of 1M tokens lets you process entire codebases, long documents, or extended conversation histories in a single request. This is particularly useful for code review across multiple files, document analysis, and agentic workflows that accumulate context over many steps.
How did Claude Sonnet 4 perform on SWE-bench Verified?
72.7% on SWE-bench Verified, matching or exceeding Claude Opus 4's 72.5% on that specific benchmark.
What is enhanced steerability in Claude Sonnet 4?
Sonnet 4 responds more precisely to instructions, reducing misinterpretation of complex or nuanced prompts. Anthropic highlighted steerability as an explicit design improvement for applications where exact specification of behavior matters.
Does Claude Sonnet 4 support extended thinking?
Yes. Sonnet 4 is a hybrid model that supports both near-instant responses and extended thinking. Extended thinking with tool use, where the model alternates between reasoning and calling tools, is also available in beta.
What is 1-hour prompt caching and does Sonnet 4 support it?
Yes. The Claude 4 launch introduced one-hour prompt caching as a new API capability, compared to shorter-lived caching in previous generations. This is particularly useful for codebases or large system prompts that appear in many requests.
Why would I use Sonnet 4 instead of Opus 4 given the SWE-bench scores are similar?
Claude Sonnet 4 is priced at the Sonnet tier, while Opus 4 is priced at the Opus tier. When benchmark results are comparable, the cost gap determines the choice at scale. Check the pricing panel on this page for current rates.