Multi-Agent Orchestration

Patterns for coordinating multiple AI agents - handoffs, delegation, parallel execution, and supervision.

Why Multi-Agent?

A single agent hits limits fast:

Context overflow - one agent can’t hold an entire codebase in context
Specialization - a security reviewer needs different instructions than a code generator
Parallelism - reviewing 10 files sequentially is slow; 10 agents in parallel is fast
Isolation - a failed subtask shouldn’t crash the whole system

Multi-agent systems solve this by splitting work across specialized agents.

Pattern 1: Handoffs (Sequential Delegation)

One agent transfers control to another. The receiving agent gets the full conversation context.

User -> Triage Agent -> Security Agent -> (back to Triage) -> User

Used by: OpenAI Agents SDK (first-class), Claude Agent SDK (via tools)

# OpenAI Agents SDK
triage = Agent(
    name="triage",
    handoffs=[
        handoff(security_agent, "For security concerns"),
        handoff(perf_agent, "For performance concerns"),
    ]
)

# The triage agent decides at runtime who to delegate to.
# Control returns after the specialist finishes.

When to use: Customer support routing, multi-step workflows where order matters, specialist delegation.

Trade-off: Sequential - each handoff adds latency. Good for workflows where Step B depends on Step A.

Pattern 2: Parallel Subagents

Spawn multiple agents simultaneously, each with its own context window.

Coordinator
    |--- Agent A (security review)  --->|
    |--- Agent B (quality review)   --->|-- Coordinator synthesizes
    |--- Agent C (perf review)      --->|

Used by: Claude Code (subagents), custom implementations

# Claude Code style - parallel subagents
import asyncio

async def parallel_review(diff):
    agents = [
        ("security", "Review for security vulnerabilities"),
        ("quality", "Review for code quality issues"),
        ("performance", "Review for performance problems"),
    ]

    tasks = [
        run_agent(instructions=prompt, input=diff)
        for name, prompt in agents
    ]

    results = await asyncio.gather(*tasks)

    # Each agent had its own context window
    # Now synthesize
    return run_agent(
        instructions="Synthesize these review findings into one report",
        input="\n".join(results)
    )

When to use: Independent tasks that can run simultaneously, large-scale analysis, reducing latency.

Trade-off: No communication between parallel agents. Each works in isolation.

Pattern 3: Supervisor

A “boss” agent that delegates tasks, monitors progress, and can intervene.

Supervisor
    |
    |-- "Agent A, research X"
    |-- (checks Agent A's output)
    |-- "Agent B, implement based on A's findings"
    |-- (checks Agent B's output)
    |-- "Agent A, verify B's implementation"
    |-- Done

Used by: LangGraph, custom implementations

# Supervisor pattern (pseudocode)
supervisor = Agent(
    instructions="""You are a project manager.
    Delegate research to the researcher.
    Delegate coding to the developer.
    Delegate testing to the tester.
    Verify each step before proceeding.""",
    tools=[
        delegate_to("researcher", researcher_agent),
        delegate_to("developer", developer_agent),
        delegate_to("tester", tester_agent),
    ]
)

When to use: Complex workflows requiring oversight, quality gates between steps, iterative refinement.

Trade-off: Supervisor becomes a bottleneck. Every task flows through it.

Pattern 4: Swarm (Peer-to-Peer)

Agents communicate directly with each other without a central coordinator.

Agent A <---> Agent B
   ^             ^
   |             |
   v             v
Agent C <---> Agent D

Used by: OpenAI Swarm (experimental), A2A protocol

# A2A-based swarm
# Each agent publishes an Agent Card
# Agents discover each other and delegate directly

research_card = {
    "name": "researcher",
    "url": "https://agents.example.com/researcher",
    "skills": [{"id": "web-research", "name": "Web Research"}]
}

# Any agent can send a task to any other agent
task = a2a_client.send_task(
    agent_url="https://agents.example.com/researcher",
    message="Research rate limiting patterns for REST APIs"
)

# Poll or stream for results
result = a2a_client.get_task(task.id)

When to use: Systems where agents are built by different teams/vendors, cross-organization collaboration, microservice-like agent architectures.

Trade-off: Hard to debug. No central place to see what’s happening.

Pattern 5: Pipeline

Agents process work in sequence, each transforming the output for the next.

Input -> Parse Agent -> Validate Agent -> Transform Agent -> Output Agent -> Result

# Pipeline pattern
pipeline = [
    Agent(name="parser", instructions="Parse the raw input into structured data"),
    Agent(name="validator", instructions="Validate the structured data"),
    Agent(name="enricher", instructions="Enrich with additional context"),
    Agent(name="formatter", instructions="Format for the target system"),
]

result = input_data
for agent in pipeline:
    result = run_agent(agent, result)

When to use: ETL-like workflows, content processing, data transformation chains.

Trade-off: Linear - one slow agent blocks the whole pipeline.

Real-World Examples

Claude Code’s Architecture

Claude Code uses Pattern 2 (parallel subagents) internally:

Main Agent (your conversation)
    |
    |-- Read tools (auto-approved)
    |-- Write tools (need approval)
    |-- Bash tools (need approval)
    |
    |-- Subagent: "Explore" (read-only, fast, for codebase search)
    |-- Subagent: "Plan" (read-only, for design work)
    |-- Subagent: "General" (full tools, for complex subtasks)

Each subagent gets a fresh context window and returns a summary. The main agent integrates results.

GitHub Copilot Coding Agent

Copilot uses Pattern 3 (supervisor) in its coding agent:

GitHub Issue assigned to Copilot
    |
    v
Copilot Agent (supervisor)
    |-- Reads codebase (MCP tools)
    |-- Plans changes
    |-- Implements changes
    |-- Runs tests
    |-- Self-corrects on failure (loops back)
    |-- Creates PR

OpenAI Customer Support Example

Classic handoff pattern from OpenAI’s docs:

triage = Agent(name="Triage", handoffs=[billing, technical, general])
billing = Agent(name="Billing", tools=[lookup_invoice, process_refund])
technical = Agent(name="Technical", tools=[check_status, restart_service])
general = Agent(name="General", tools=[search_faq])

Choosing a Pattern

Pattern	Latency	Complexity	Best for
Handoff	Medium	Low	Routing, specialist delegation
Parallel	Low	Medium	Independent subtasks
Supervisor	High	High	Quality-critical workflows
Swarm	Variable	Very High	Cross-vendor agent networks
Pipeline	High	Low	Sequential transformations

Common Pitfalls

Over-engineering - start with one agent. Add more only when you hit limits.
Context loss - handoffs and subagents lose context. Be explicit about what to pass.
Infinite loops - Agent A delegates to Agent B, which delegates back to Agent A. Always set recursion limits.
Cost explosion - 10 parallel agents each making 5 LLM calls = 50 API calls per user request. Monitor costs.
Debugging - multi-agent systems are hard to debug. Use tracing (OpenAI) or structured logging from day one.