Anthropic released 1 million token context windows for Claude Opus 4.6 and Sonnet 4.6 on March 13, 2026. Both models now process up to 1M tokens at standard pricing with no premium multiplier or rate limit penalties.
What changed for Claude 1M context GA on March 13, 2026?
Anthropic removed the 2x input price multiplier for Sonnet 4's long-context requests and added 1M context to Opus 4.6. Both models now process 1M tokens at standard rates with no premium pricing.
Sonnet 4's 1M context previously carried a 2x input price multiplier for prompts exceeding 200K tokens during public beta. Opus 4.6 lacked 1M context entirely.
Both flagship models now offer the full 1M window at their standard rates:
| Model | Input Price | Output Price | Context Window |
|---|---|---|---|
| Opus 4.6 | $5/MTok | $25/MTok | 1,000,000 tokens |
| Sonnet 4.6 | $3/MTok | $15/MTok | 1,000,000 tokens |
Users pay the same per-token rate whether their request uses 9,000 tokens or 900,000 tokens. This eliminates cost penalties for processing large codebases, legal documents, or extended agent sessions.
How do Claude Opus and Sonnet perform on long-context benchmarks?
Opus 4.6 scores 78.3% on MRCR v2 at 1M tokens and Sonnet 4.6 scores 68.4% on GraphWalks BFS at 1M tokens. Both achieve the highest scores in their respective model classes.
Anthropic expanded the context window while maintaining accuracy across long sequences. MRCR (Multi-Round Coreference Resolution) tests whether models track entities and relationships across massive contexts. This capability proves essential when processing entire codebases or 500-page contracts.
GraphWalks BFS measures a model's ability to navigate complex information structures within long contexts. Sonnet 4.6's 68.4% score demonstrates reliable performance on structured data analysis tasks.
What does 1 million tokens represent in practical terms?
1 million tokens equals approximately 750,000 words of text, 75,000+ lines of code, or 600 images/PDF pages in a single request. This represents a 6x increase from the previous 100-page media limit.
The token capacity translates to specific use cases:
10 full-length novels worth of text content
Complete codebases with cross-file dependency tracking
Entire contract sets or research paper collections
Thousands of pages of documentation
The media capacity increase from 100 to 600 pages per request significantly impacts enterprise document analysis workflows.
How does 1M context improve Claude Code performance?
Claude Code users experience 15% fewer compaction events with 1M context. Users can search, re-search, and refactor entire codebases without losing conversation history or model context.
Claude Code previously consumed 100K+ tokens searching codebases, triggering compaction processes that summarized earlier conversations. This compression caused the model to lose nuance from earlier searches and forget edge cases discovered minutes earlier.
The 1M context window allows continuous exploration of entire codebases without compaction interruptions. Users load complete repositories, explore thoroughly, and receive comprehensive fixes while the model maintains full awareness of all discovered patterns and edge cases.
Where can developers access Claude 1M context today?
1M context is available on Claude Platform API, Microsoft Azure Foundry, Google Cloud Vertex AI, and Claude Code for Max/Team/Enterprise plans. No beta headers or special model versions are required.
Platform availability includes:
Claude Platform API with standard endpoints
Microsoft Azure Foundry integration
Google Cloud Vertex AI deployment
Claude Code with automatic enablement on paid plans
Full throughput rate limits apply at every context length. Developers experience no reduced rates for long-context requests compared to shorter inputs.
How does Claude's pricing compare to other AI providers?
Claude offers the only frontier model family with 1M context across flagship models at flat pricing. GPT-5 provides 256K tokens maximum while Gemini 2.5 Pro uses tiered pricing for longer contexts.
| Provider | Max Context | Long-Context Premium |
|---|---|---|
| Claude Opus 4.6 | 1M tokens | None (standard pricing) |
| Claude Sonnet 4.6 | 1M tokens | None (standard pricing) |
| GPT-5 | 256K tokens | N/A |
| Gemini 2.5 Pro | 1M tokens | Tiered pricing applies |
Gemini 2.5 Pro matches Claude's 1M token window but implements tiered pricing structures that increase costs for longer contexts. Claude maintains consistent per-token rates regardless of context length.
What cost optimization features work with 1M context?
Prompt Caching reduces costs for repeated queries on the same large context, while Batch Processing provides 50% cost savings for non-time-sensitive workloads. Combined usage delivers maximum cost efficiency.
Prompt Caching stores frequently accessed large contexts like codebases. The first request loads the full context while subsequent requests reuse the cached version, reducing both latency and costs.
Batch Processing mode processes non-urgent workloads at 50% of standard pricing. Users can combine caching with batching to process massive document sets at significantly reduced costs.
How should developers adapt their AI architectures for 1M context?
Developers can eliminate document chunking, build longer-running agents, and simplify RAG architectures. Datasets under 750K words may not require RAG systems when full context loading becomes feasible.
The expanded context window enables architectural simplifications:
Process entire repositories, contract sets, or research collections in single requests
Maintain full execution traces in agents without summarization losses
Load complete datasets directly instead of building complex retrieval pipelines
When 600 PDF pages or 75,000 lines of code fit in a single request at standard pricing, traditional chunking and retrieval strategies become unnecessary for moderate-sized datasets.
How do developers start using Claude 1M context?
API users send requests up to 1M tokens without special headers. Claude Code users update to the latest version for automatic enablement. Cloud users access standard endpoints on Azure Foundry and Vertex AI.
Implementation steps:
API users: Send requests up to 1M tokens using standard endpoints
Claude Code users: Update to latest version on Max/Team/Enterprise plans
Cloud users: Access via Azure Foundry and Vertex AI standard model endpoints
The model IDs are claude-opus-4-6 and claude-sonnet-4-6. No new model versions or special variants are required for 1M context access.
Related Resources
Explore more AI tools and guides
Best AI Legal Tools 2026: Ultimate Harvey AI vs LegalZoom vs Casetext Comparison for Law Firms
Best AI Productivity Tools 2026: Ultimate Notion AI vs ClickUp vs Monday.com Comparison for Remote Teams
Figure Robot vs Tesla Bot 2026: Ultimate Humanoid AI Robot Comparison for Home Automation
Best AI Blog Writer 2026: Ultimate Hands-On Review of Top Tools for Automated Content Creation and SEO Optimization
Best Free AI Photo Editor 2026: Ultimate Hands-On Review of Top Tools for Effortless Image Enhancement and Creative Edits
More ai news articles
About the Author
Rai Ansar
Founder of AIToolRanked • AI Researcher • 200+ Tools Tested
I've been obsessed with AI since ChatGPT launched in November 2022. What started as curiosity turned into a mission: testing every AI tool to find what actually works. I spend $5,000+ monthly on AI subscriptions so you don't have to. Every review comes from hands-on experience, not marketing claims.



