Anthropic just made one of the most developer-friendly moves in the AI space: 1 million token context is now generally available for Claude Opus 4.6 and Sonnet 4.6 — at standard pricing. No premium multiplier, no beta headers, no rate limit penalties.
If you've been working with long-context workloads, this changes everything about how you build with Claude.
What Changed on March 13, 2026
Previously, Sonnet 4's 1M context was in public beta with a 2x input price multiplier for prompts exceeding 200K tokens. Opus didn't have 1M context at all.
Now both flagship models get the full 1M window at their standard rates:
| Model | Input Price | Output Price | Context Window |
|---|---|---|---|
| Opus 4.6 | $5/MTok | $25/MTok | 1,000,000 tokens |
| Sonnet 4.6 | $3/MTok | $15/MTok | 1,000,000 tokens |
The long-context premium is gone. Whether your request uses 9,000 tokens or 900,000, you pay the same per-token rate. That's a massive cost reduction for anyone processing large codebases, legal documents, or running extended agent sessions.
The Benchmarks Tell the Story
Anthropic isn't just expanding the window — they're leading on long-context accuracy:
Opus 4.6 scores 78.3% on MRCR v2 at 1M tokens, the highest among frontier models
Sonnet 4.6 scores 68.4% on GraphWalks BFS at 1M tokens, also the highest in its class
These aren't synthetic benchmarks. MRCR (Multi-Round Coreference Resolution) tests whether a model can track entities and relationships across massive contexts — exactly what you need when processing an entire codebase or a 500-page contract.
What 1M Tokens Actually Looks Like
To put 1 million tokens in perspective:
~750,000 words of text (roughly 10 full-length novels)
75,000+ lines of code with cross-file dependency tracking
600 images or PDF pages in a single request (up from the previous 100-page limit)
Thousands of pages of contracts, research papers, or documentation
The 6x increase in media capacity (100 → 600 pages) is particularly significant for enterprise workflows involving document analysis.
Why Claude Code Users Should Care Most
If you use Claude Code, this is arguably the biggest quality-of-life improvement since the tool launched. Here's why:
Before 1M context: Claude Code would burn through 100K+ tokens searching your codebase, then hit compaction — a process that summarizes earlier conversation to free up context space. You'd lose nuance from earlier searches, and the model would sometimes forget edge cases it discovered 10 minutes ago.
After 1M context: You can search, re-search, aggregate edge cases, refactor, and propose fixes — all in one continuous window without compaction kicking in.
Anthropic reports a 15% decrease in compaction events across Claude Code sessions since enabling 1M context on Max, Team, and Enterprise plans.
As one software engineer put it: with 1M context, you can load an entire codebase, explore it thoroughly, and propose comprehensive fixes without the model losing track of what it found on file one while reading file fifty.
Platform Availability
1M context is available today on:
Claude Platform (API) — no beta header required
Microsoft Azure Foundry
Google Cloud Vertex AI
Claude Code — included in Max, Team, and Enterprise plans
Full throughput rate limits apply at every context length. No more reduced rates for long-context requests.
How This Compares to the Competition
Let's put this in context:
| Provider | Max Context | Long-Context Premium |
|---|---|---|
| Claude Opus 4.6 | 1M tokens | None (standard pricing) |
| Claude Sonnet 4.6 | 1M tokens | None (standard pricing) |
| GPT-5 | 256K tokens | N/A |
| Gemini 2.5 Pro | 1M tokens | Tiered pricing applies |
Claude is now the only frontier model family offering 1M context across both its flagship models with completely flat pricing. Gemini 2.5 Pro matches on window size but uses tiered pricing for longer contexts.
Cost Optimization: Caching + Batching
Even at standard rates, processing 1M tokens isn't cheap. Two features help:
Prompt Caching: If you're repeatedly querying the same large context (like a codebase), cached prompts dramatically reduce both latency and cost. The first request loads the full context; subsequent requests reuse the cached version.
Batch Processing: For non-time-sensitive workloads, batch mode gives you an additional 50% cost savings. Combine this with caching and you can process massive document sets at a fraction of the list price.
What This Means for AI Development
The removal of the long-context pricing premium signals a broader industry shift. As models get better at handling large contexts efficiently, the cost penalty for using them is disappearing.
For developers, this means:
Stop chunking documents — you can now process entire repositories, contract sets, or research collections in a single pass
Build longer-running agents — agents can maintain full execution traces without summarization losses
Simplify RAG architectures — for datasets under 750K words, you might not need RAG at all; just load the full context
The era of carefully managing context windows and building complex retrieval pipelines for moderate-sized datasets may be ending. When you can load 600 PDF pages or 75,000 lines of code in a single request at standard pricing, the architecture simplifies dramatically.
Getting Started
To use 1M context today:
API users: Just send requests up to 1M tokens — no special headers or flags needed
Claude Code users: Update to the latest version; 1M context is enabled automatically on Max/Team/Enterprise
Cloud users: Available on Azure Foundry and Vertex AI with standard model endpoints
The model IDs are claude-opus-4-6 and claude-sonnet-4-6. No new model versions or special variants required.
Bottom Line
This is Anthropic delivering on the promise of truly useful long context. Not a gimmick, not a beta with gotchas — a production-ready 1M token window at standard pricing across two frontier models, with the benchmarks to back it up.
For anyone building AI-powered code analysis, document processing, or agentic workflows, March 13 just became a very good day.
Related Resources
Explore more AI tools and guides
Best AI Legal Tools 2026: Ultimate Harvey AI vs LegalZoom vs Casetext Comparison for Law Firms
Best AI Productivity Tools 2026: Ultimate Notion AI vs ClickUp vs Monday.com Comparison for Remote Teams
Figure Robot vs Tesla Bot 2026: Ultimate Humanoid AI Robot Comparison for Home Automation
Best AI Paraphrasing Tool Free 2026: QuillBot vs Grammarly vs Wordtune Ultimate Comparison for Students
Best AI Subtitle Generator Free 2026: Ultimate Rev vs Descript vs Otter.ai Comparison for Content Creators
More ai news articles
About the Author
Rai Ansar
Founder of AIToolRanked • AI Researcher • 200+ Tools Tested
I've been obsessed with AI since ChatGPT launched in November 2022. What started as curiosity turned into a mission: testing every AI tool to find what actually works. I spend $5,000+ monthly on AI subscriptions so you don't have to. Every review comes from hands-on experience, not marketing claims.



