Models

Claude’s model family as of early 2026: Opus 4.6, Sonnet 4.6, and Haiku 4.5.

Model Comparison

Opus 4.6Sonnet 4.6Haiku 4.5
Context window1M tokens1M tokens200K tokens
Max output32K tokens16K tokens8K tokens
SpeedSlowestFastFastest
Cost (input/output)$15 / $75 per 1M$3 / $15 per 1M$0.80 / $4 per 1M
Best forComplex reasoning, hard agentic tasksDaily coding, balanced speed/qualityClassification, routing, simple tasks
Model IDclaude-opus-4-6claude-sonnet-4-6claude-haiku-4-5-20251001

Opus 4.6

The most capable model in the Claude family.

Key capabilities:

  • 1M token context - processes entire codebases, long documents, extensive conversation histories
  • Adaptive thinking - four effort levels that adjust reasoning depth based on task complexity
  • Agent teams - can coordinate multiple Claude instances working in parallel
  • 76% on MRCR v2 - state-of-the-art on multi-step reasoning benchmarks
  • Context compaction - automatically summarizes earlier context when approaching limits

When to use: Complex multi-file refactors, architectural analysis, security reviews, research tasks requiring deep reasoning, any task where accuracy matters more than speed.

Sonnet 4.6

The first Sonnet model to be preferred over the previous-generation Opus in coding benchmarks.

Key capabilities:

  • 1M token context - same context window as Opus
  • Strong coding - preferred over Claude 3.5 Opus in head-to-head coding evaluations
  • Fast - 3-5x faster than Opus for most tasks
  • Cost-effective - 5x cheaper than Opus per token

When to use: Daily development work, code generation, PR reviews, most agentic coding tasks. The default choice unless you specifically need Opus-level reasoning.

Haiku 4.5

Lightweight model optimized for speed and cost.

Key capabilities:

  • 200K token context - smaller than Opus/Sonnet but sufficient for most tasks
  • Fastest response times - ideal for interactive and real-time applications
  • Cheapest - nearly 20x cheaper than Opus

When to use: Classification, routing, simple extraction, chatbot responses, any task where latency matters more than depth. Often used as a “triage” model that decides which more expensive model to invoke.

Choosing the Right Model

Is the task complex (multi-step reasoning, large codebase)?
├── Yes → Opus 4.6
└── No
    Is speed/cost important?
    ├── Yes, and the task is simple → Haiku 4.5
    └── Yes, but I need good quality → Sonnet 4.6

In Claude Code, you can override the model per-session or per-subagent:

{
  "model": "claude-opus-4-6"
}

In the Agent SDK:

async for msg in query(
    prompt="Analyze this architecture",
    options={"model": "claude-opus-4-6"},
):
    print(msg)

Adaptive Thinking (Opus 4.6)

Opus 4.6 introduced adaptive thinking with four effort levels:

LevelUse caseToken overhead
LowSimple lookups, classificationMinimal
MediumStandard coding tasksModerate
HighComplex debugging, multi-step reasoningSignificant
MaxHard math, novel architecture decisionsMaximum

The model auto-selects the appropriate level based on perceived task complexity. You can also set it explicitly via the API.

Context Compaction

All models support context compaction - automatic summarization of earlier conversation turns when context fills up. This is critical for long agentic sessions where tool calls accumulate rapidly.

The compact-2026-01-12 strategy provides:

  • Configurable threshold (default: 80% of context window)
  • pause_after_compaction option to let the user review before continuing
  • Preserves critical information (code changes, decisions) while compressing intermediate steps

Next