Skip to content

Claude Sonnet 4

Claude Sonnet 4 supports a context window of 1M tokens on Vercel AI Gateway, enabling full codebase analysis of 75,000+ lines or large document sets, while scoring 72.7% on SWE-bench Verified with hybrid extended thinking and enhanced steerability.

File InputReasoningTool UseVision (Image)Explicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'anthropic/claude-sonnet-4',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Configuration: The context window of 1M tokens significantly increases per-request token volumes. Monitor cost per request carefully when processing full codebases or large document collections, as a single request can consume tokens equivalent to many standard calls.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Claude Sonnet 4

Best For

  • Full-codebase analysis and understanding: With the context window of 1M tokens, pass an entire large repository in a single request
  • Agentic coding at scale: SWE-bench 72.7% and GitHub Copilot's selection as its coding agent model speak directly to this use case
  • Multi-file refactoring and architectural reasoning: The model needs to hold the whole picture simultaneously
  • Instruction-precise applications: Steerability matters, the model was specifically improved to follow complex, nuanced instructions more accurately
  • Agent workflows at Sonnet pricing: The 5x cost difference from Opus 4 makes Sonnet 4 the right choice when the benchmark results are comparable

Consider Alternatives When

  • Provider flexibility: Check whether later model versions support 1M tokens through all providers when you don't want to pin to Anthropic
  • Sonnet 4.5 improvements: OSWorld computer use performance, 30+ hour agentic task duration, and domain-specific reasoning advances
  • Haiku 4.5 capability match: Lower-cost option when Haiku 4.5 covers the capability requirements
  • Opus-level reasoning depth: Sonnet benchmarks don't capture the full difficulty of some problems

Conclusion

Claude Sonnet 4 pairs strong coding benchmark performance with a context window of 1M tokens that makes entire codebases processable in one shot, a combination that changes what's architecturally feasible for software engineering agents. At Sonnet pricing, it's the default choice for teams building on the Claude 4 generation until their workloads specifically require Opus or the later Sonnet improvements.

Frequently Asked Questions

  • How do I enable the context window of 1M tokens for Claude Sonnet 4 on AI Gateway?

    Add the anthropic-beta: context-1m-2025-08-07 header to your request. Under providerOptions.gateway, set only to ['anthropic'] so the request routes through the Anthropic provider, which supports the feature.

  • What does the context window of 1M tokens enable in practice?

    The context window of 1M tokens lets you process entire codebases, long documents, or extended conversation histories in a single request. This is particularly useful for code review across multiple files, document analysis, and agentic workflows that accumulate context over many steps.

  • How did Claude Sonnet 4 perform on SWE-bench Verified?

    72.7% on SWE-bench Verified, matching or exceeding Claude Opus 4's 72.5% on that specific benchmark.

  • What is enhanced steerability in Claude Sonnet 4?

    Sonnet 4 responds more precisely to instructions, reducing misinterpretation of complex or nuanced prompts. Anthropic highlighted steerability as an explicit design improvement for applications where exact specification of behavior matters.

  • Does Claude Sonnet 4 support extended thinking?

    Yes. Sonnet 4 is a hybrid model that supports both near-instant responses and extended thinking. Extended thinking with tool use, where the model alternates between reasoning and calling tools, is also available in beta.

  • What is 1-hour prompt caching and does Sonnet 4 support it?

    Yes. The Claude 4 launch introduced one-hour prompt caching as a new API capability, compared to shorter-lived caching in previous generations. This is particularly useful for codebases or large system prompts that appear in many requests.

  • Why would I use Sonnet 4 instead of Opus 4 given the SWE-bench scores are similar?

    Claude Sonnet 4 is priced at the Sonnet tier, while Opus 4 is priced at the Opus tier. When benchmark results are comparable, the cost gap determines the choice at scale. Check the pricing panel on this page for current rates.