How do I enable the context window of 1M tokens for Claude Sonnet 4 on AI Gateway?

Add the `anthropic-beta: context-1m-2025-08-07` header to your request. Under `providerOptions.gateway`, set `only` to `['anthropic']` so the request routes through the Anthropic provider, which supports the feature.

What does the context window of 1M tokens enable in practice?

The context window of 1M tokens lets you process entire codebases, long documents, or extended conversation histories in a single request. This is particularly useful for code review across multiple files, document analysis, and agentic workflows that accumulate context over many steps.

How did Claude Sonnet 4 perform on SWE-bench Verified?

72.7% on SWE-bench Verified, matching or exceeding Claude Opus 4's 72.5% on that specific benchmark.

What is enhanced steerability in Claude Sonnet 4?

Sonnet 4 responds more precisely to instructions, reducing misinterpretation of complex or nuanced prompts. Anthropic highlighted steerability as an explicit design improvement for applications where exact specification of behavior matters.

Does Claude Sonnet 4 support extended thinking?

Yes. Sonnet 4 is a hybrid model that supports both near-instant responses and extended thinking. Extended thinking with tool use, where the model alternates between reasoning and calling tools, is also available in beta.

What is 1-hour prompt caching and does Sonnet 4 support it?

Yes. The Claude 4 launch introduced one-hour prompt caching as a new API capability, compared to shorter-lived caching in previous generations. This is particularly useful for codebases or large system prompts that appear in many requests.

Why would I use Sonnet 4 instead of Opus 4 given the SWE-bench scores are similar?

Claude Sonnet 4 is priced at the Sonnet tier, while Opus 4 is priced at the Opus tier. When benchmark results are comparable, the cost gap determines the choice at scale. Check the pricing panel on this page for current rates.

Claude Sonnet 4

Claude Sonnet 4 supports a context window of 1M tokens on Vercel AI Gateway, enabling full codebase analysis of 75,000+ lines or large document sets, while scoring 72.7% on SWE-bench Verified with hybrid extended thinking and enhanced steerability.

File InputReasoningTool UseVision (Image)Explicit Caching

import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-sonnet-4',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Claude Sonnet 4 by Anthropic. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Claude Sonnet 4

Claude Sonnet 4 launched on May 22, 2025 alongside Claude Opus 4, positioned as a significant upgrade over Claude Sonnet 3.7 with superior coding, reasoning, and instruction-following precision. It scored 72.7% on SWE-bench Verified, matching or exceeding Opus 4's 72.5% on the coding benchmark that matters most to software engineering teams. GitHub announced it would power the new coding agent in GitHub Copilot. Manus highlighted improvements in complex instruction following and aesthetic outputs. iGent reported navigation errors dropping from 20% to near zero.

The context window of 1M tokens lets you process full codebases of approximately 75,000+ lines of code or equivalently large document sets in a single request. Access this by adding the anthropic-beta: context-1m-2025-08-07 request header and routing through the Anthropic provider specifically.

Sonnet 4 is a hybrid model offering both near-instant standard responses and extended thinking for deeper reasoning. Extended thinking with tool use is also available in beta. Both Sonnet 4 and Opus 4 can use tools in parallel. The Claude 4 generation introduced one-hour prompt caching as a new API capability. Steerability was a highlighted design improvement: the model responds more precisely to instructions, reducing the gap between what you ask and what you get.

The 65% reduction in shortcut-taking behavior compared to Sonnet 3.7 applies to Sonnet 4 as well as Opus 4, making it more reliable in production agentic deployments.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

1.1s

52tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

05/22/2025

Legal:Terms

•

Privacy

0.8s

79tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

—

05/22/2025

Legal:Terms

•

Privacy

0.6s

53tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

05/22/2025

More models by Anthropic

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.8s

94tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10/K

+ input costs

—

04/16/2026

1.1s

55tps

$3.00/M

$15.00/M

Read:$0.3/M

Write:

$3.75/M

$10/K

+ input costs

—

02/17/2026

0.8s

60tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10/K

+ input costs

—

02/05/2026

200K

0.4s

110tps

$1.00/M

$5.00/M

Read:$0.1/M

Write:

$1.25/M

$10.00/K

+ input costs

—

10/15/2025

0.7s

56tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

09/29/2025

200K

0.6s

50tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10.00/K

+ input costs

—

11/24/2024

What To Consider When Choosing a Provider

Configuration: The context window of 1M tokens significantly increases per-request token volumes. Monitor cost per request carefully when processing full codebases or large document collections, as a single request can consume tokens equivalent to many standard calls.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Claude Sonnet 4

Best For

Full-codebase analysis and understanding: With the context window of 1M tokens, pass an entire large repository in a single request
Agentic coding at scale: SWE-bench 72.7% and GitHub Copilot's selection as its coding agent model speak directly to this use case
Multi-file refactoring and architectural reasoning: The model needs to hold the whole picture simultaneously
Instruction-precise applications: Steerability matters, the model was specifically improved to follow complex, nuanced instructions more accurately
Agent workflows at Sonnet pricing: The 5x cost difference from Opus 4 makes Sonnet 4 the right choice when the benchmark results are comparable

Consider Alternatives When

Provider flexibility: Check whether later model versions support 1M tokens through all providers when you don't want to pin to Anthropic
Sonnet 4.5 improvements: OSWorld computer use performance, 30+ hour agentic task duration, and domain-specific reasoning advances
Haiku 4.5 capability match: Lower-cost option when Haiku 4.5 covers the capability requirements
Opus-level reasoning depth: Sonnet benchmarks don't capture the full difficulty of some problems

Conclusion

Claude Sonnet 4 pairs strong coding benchmark performance with a context window of 1M tokens that makes entire codebases processable in one shot, a combination that changes what's architecturally feasible for software engineering agents. At Sonnet pricing, it's the default choice for teams building on the Claude 4 generation until their workloads specifically require Opus or the later Sonnet improvements.

Frequently Asked Questions

How do I enable the context window of 1M tokens for Claude Sonnet 4 on AI Gateway?
Add the anthropic-beta: context-1m-2025-08-07 header to your request. Under providerOptions.gateway, set only to ['anthropic'] so the request routes through the Anthropic provider, which supports the feature.
What does the context window of 1M tokens enable in practice?
The context window of 1M tokens lets you process entire codebases, long documents, or extended conversation histories in a single request. This is particularly useful for code review across multiple files, document analysis, and agentic workflows that accumulate context over many steps.
How did Claude Sonnet 4 perform on SWE-bench Verified?
72.7% on SWE-bench Verified, matching or exceeding Claude Opus 4's 72.5% on that specific benchmark.
What is enhanced steerability in Claude Sonnet 4?
Sonnet 4 responds more precisely to instructions, reducing misinterpretation of complex or nuanced prompts. Anthropic highlighted steerability as an explicit design improvement for applications where exact specification of behavior matters.
Does Claude Sonnet 4 support extended thinking?
Yes. Sonnet 4 is a hybrid model that supports both near-instant responses and extended thinking. Extended thinking with tool use, where the model alternates between reasoning and calling tools, is also available in beta.
What is 1-hour prompt caching and does Sonnet 4 support it?
Yes. The Claude 4 launch introduced one-hour prompt caching as a new API capability, compared to shorter-lived caching in previous generations. This is particularly useful for codebases or large system prompts that appear in many requests.
Why would I use Sonnet 4 instead of Opus 4 given the SWE-bench scores are similar?
Claude Sonnet 4 is priced at the Sonnet tier, while Opus 4 is priced at the Opus tier. When benchmark results are comparable, the cost gap determines the choice at scale. Check the pricing panel on this page for current rates.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Claude Sonnet 4

Playground

About Claude Sonnet 4

Providers

More models by Anthropic

What To Consider When Choosing a Provider

When to Use Claude Sonnet 4

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About Claude Sonnet 4

Providers

More models by Anthropic