Models
Claude’s model family as of early 2026: Opus 4.6, Sonnet 4.6, and Haiku 4.5.
Model Comparison
| Opus 4.6 | Sonnet 4.6 | Haiku 4.5 | |
|---|---|---|---|
| Context window | 1M tokens | 1M tokens | 200K tokens |
| Max output | 32K tokens | 16K tokens | 8K tokens |
| Speed | Slowest | Fast | Fastest |
| Cost (input/output) | $15 / $75 per 1M | $3 / $15 per 1M | $0.80 / $4 per 1M |
| Best for | Complex reasoning, hard agentic tasks | Daily coding, balanced speed/quality | Classification, routing, simple tasks |
| Model ID | claude-opus-4-6 | claude-sonnet-4-6 | claude-haiku-4-5-20251001 |
Opus 4.6
The most capable model in the Claude family.
Key capabilities:
- 1M token context - processes entire codebases, long documents, extensive conversation histories
- Adaptive thinking - four effort levels that adjust reasoning depth based on task complexity
- Agent teams - can coordinate multiple Claude instances working in parallel
- 76% on MRCR v2 - state-of-the-art on multi-step reasoning benchmarks
- Context compaction - automatically summarizes earlier context when approaching limits
When to use: Complex multi-file refactors, architectural analysis, security reviews, research tasks requiring deep reasoning, any task where accuracy matters more than speed.
Sonnet 4.6
The first Sonnet model to be preferred over the previous-generation Opus in coding benchmarks.
Key capabilities:
- 1M token context - same context window as Opus
- Strong coding - preferred over Claude 3.5 Opus in head-to-head coding evaluations
- Fast - 3-5x faster than Opus for most tasks
- Cost-effective - 5x cheaper than Opus per token
When to use: Daily development work, code generation, PR reviews, most agentic coding tasks. The default choice unless you specifically need Opus-level reasoning.
Haiku 4.5
Lightweight model optimized for speed and cost.
Key capabilities:
- 200K token context - smaller than Opus/Sonnet but sufficient for most tasks
- Fastest response times - ideal for interactive and real-time applications
- Cheapest - nearly 20x cheaper than Opus
When to use: Classification, routing, simple extraction, chatbot responses, any task where latency matters more than depth. Often used as a “triage” model that decides which more expensive model to invoke.
Choosing the Right Model
Is the task complex (multi-step reasoning, large codebase)?
├── Yes → Opus 4.6
└── No
Is speed/cost important?
├── Yes, and the task is simple → Haiku 4.5
└── Yes, but I need good quality → Sonnet 4.6
In Claude Code, you can override the model per-session or per-subagent:
{
"model": "claude-opus-4-6"
}
In the Agent SDK:
async for msg in query(
prompt="Analyze this architecture",
options={"model": "claude-opus-4-6"},
):
print(msg)
Adaptive Thinking (Opus 4.6)
Opus 4.6 introduced adaptive thinking with four effort levels:
| Level | Use case | Token overhead |
|---|---|---|
| Low | Simple lookups, classification | Minimal |
| Medium | Standard coding tasks | Moderate |
| High | Complex debugging, multi-step reasoning | Significant |
| Max | Hard math, novel architecture decisions | Maximum |
The model auto-selects the appropriate level based on perceived task complexity. You can also set it explicitly via the API.
Context Compaction
All models support context compaction - automatic summarization of earlier conversation turns when context fills up. This is critical for long agentic sessions where tool calls accumulate rapidly.
The compact-2026-01-12 strategy provides:
- Configurable threshold (default: 80% of context window)
pause_after_compactionoption to let the user review before continuing- Preserves critical information (code changes, decisions) while compressing intermediate steps
Next
- API deep dive - compaction configuration, advanced tool use, computer use
- Claude Code features - how features use different models