Models

Claude’s model family as of early 2026: Opus 4.6, Sonnet 4.6, and Haiku 4.5.

Model Comparison

	Opus 4.6	Sonnet 4.6	Haiku 4.5
Context window	1M tokens	1M tokens	200K tokens
Max output	32K tokens	16K tokens	8K tokens
Speed	Slowest	Fast	Fastest
Cost (input/output)	$15 / $75 per 1M	$3 / $15 per 1M	$0.80 / $4 per 1M
Best for	Complex reasoning, hard agentic tasks	Daily coding, balanced speed/quality	Classification, routing, simple tasks
Model ID	`claude-opus-4-6`	`claude-sonnet-4-6`	`claude-haiku-4-5-20251001`

Opus 4.6

The most capable model in the Claude family.

Key capabilities:

1M token context - processes entire codebases, long documents, extensive conversation histories
Adaptive thinking - four effort levels that adjust reasoning depth based on task complexity
Agent teams - can coordinate multiple Claude instances working in parallel
76% on MRCR v2 - state-of-the-art on multi-step reasoning benchmarks
Context compaction - automatically summarizes earlier context when approaching limits

When to use: Complex multi-file refactors, architectural analysis, security reviews, research tasks requiring deep reasoning, any task where accuracy matters more than speed.

Sonnet 4.6

The first Sonnet model to be preferred over the previous-generation Opus in coding benchmarks.

Key capabilities:

1M token context - same context window as Opus
Strong coding - preferred over Claude 3.5 Opus in head-to-head coding evaluations
Fast - 3-5x faster than Opus for most tasks
Cost-effective - 5x cheaper than Opus per token

When to use: Daily development work, code generation, PR reviews, most agentic coding tasks. The default choice unless you specifically need Opus-level reasoning.

Haiku 4.5

Lightweight model optimized for speed and cost.

Key capabilities:

200K token context - smaller than Opus/Sonnet but sufficient for most tasks
Fastest response times - ideal for interactive and real-time applications
Cheapest - nearly 20x cheaper than Opus

When to use: Classification, routing, simple extraction, chatbot responses, any task where latency matters more than depth. Often used as a “triage” model that decides which more expensive model to invoke.

Choosing the Right Model

Is the task complex (multi-step reasoning, large codebase)?
├── Yes → Opus 4.6
└── No
    Is speed/cost important?
    ├── Yes, and the task is simple → Haiku 4.5
    └── Yes, but I need good quality → Sonnet 4.6

In Claude Code, you can override the model per-session or per-subagent:

{
  "model": "claude-opus-4-6"
}

In the Agent SDK:

async for msg in query(
    prompt="Analyze this architecture",
    options={"model": "claude-opus-4-6"},
):
    print(msg)

Adaptive Thinking (Opus 4.6)

Opus 4.6 introduced adaptive thinking with four effort levels:

Level	Use case	Token overhead
Low	Simple lookups, classification	Minimal
Medium	Standard coding tasks	Moderate
High	Complex debugging, multi-step reasoning	Significant
Max	Hard math, novel architecture decisions	Maximum

The model auto-selects the appropriate level based on perceived task complexity. You can also set it explicitly via the API.

Context Compaction

All models support context compaction - automatic summarization of earlier conversation turns when context fills up. This is critical for long agentic sessions where tool calls accumulate rapidly.

The compact-2026-01-12 strategy provides:

Configurable threshold (default: 80% of context window)
pause_after_compaction option to let the user review before continuing
Preserves critical information (code changes, decisions) while compressing intermediate steps

API deep dive - compaction configuration, advanced tool use, computer use
Claude Code features - how features use different models

Models

Model Comparison

Opus 4.6

Sonnet 4.6

Haiku 4.5

Choosing the Right Model

Adaptive Thinking (Opus 4.6)

Context Compaction

Next