Multi-Agent Orchestration
Patterns for coordinating multiple AI agents - handoffs, delegation, parallel execution, and supervision.
Why Multi-Agent?
A single agent hits limits fast:
- Context overflow - one agent can’t hold an entire codebase in context
- Specialization - a security reviewer needs different instructions than a code generator
- Parallelism - reviewing 10 files sequentially is slow; 10 agents in parallel is fast
- Isolation - a failed subtask shouldn’t crash the whole system
Multi-agent systems solve this by splitting work across specialized agents.
Pattern 1: Handoffs (Sequential Delegation)
One agent transfers control to another. The receiving agent gets the full conversation context.
User -> Triage Agent -> Security Agent -> (back to Triage) -> User
Used by: OpenAI Agents SDK (first-class), Claude Agent SDK (via tools)
# OpenAI Agents SDK
triage = Agent(
name="triage",
handoffs=[
handoff(security_agent, "For security concerns"),
handoff(perf_agent, "For performance concerns"),
]
)
# The triage agent decides at runtime who to delegate to.
# Control returns after the specialist finishes.
When to use: Customer support routing, multi-step workflows where order matters, specialist delegation.
Trade-off: Sequential - each handoff adds latency. Good for workflows where Step B depends on Step A.
Pattern 2: Parallel Subagents
Spawn multiple agents simultaneously, each with its own context window.
Coordinator
|--- Agent A (security review) --->|
|--- Agent B (quality review) --->|-- Coordinator synthesizes
|--- Agent C (perf review) --->|
Used by: Claude Code (subagents), custom implementations
# Claude Code style - parallel subagents
import asyncio
async def parallel_review(diff):
agents = [
("security", "Review for security vulnerabilities"),
("quality", "Review for code quality issues"),
("performance", "Review for performance problems"),
]
tasks = [
run_agent(instructions=prompt, input=diff)
for name, prompt in agents
]
results = await asyncio.gather(*tasks)
# Each agent had its own context window
# Now synthesize
return run_agent(
instructions="Synthesize these review findings into one report",
input="\n".join(results)
)
When to use: Independent tasks that can run simultaneously, large-scale analysis, reducing latency.
Trade-off: No communication between parallel agents. Each works in isolation.
Pattern 3: Supervisor
A “boss” agent that delegates tasks, monitors progress, and can intervene.
Supervisor
|
|-- "Agent A, research X"
|-- (checks Agent A's output)
|-- "Agent B, implement based on A's findings"
|-- (checks Agent B's output)
|-- "Agent A, verify B's implementation"
|-- Done
Used by: LangGraph, custom implementations
# Supervisor pattern (pseudocode)
supervisor = Agent(
instructions="""You are a project manager.
Delegate research to the researcher.
Delegate coding to the developer.
Delegate testing to the tester.
Verify each step before proceeding.""",
tools=[
delegate_to("researcher", researcher_agent),
delegate_to("developer", developer_agent),
delegate_to("tester", tester_agent),
]
)
When to use: Complex workflows requiring oversight, quality gates between steps, iterative refinement.
Trade-off: Supervisor becomes a bottleneck. Every task flows through it.
Pattern 4: Swarm (Peer-to-Peer)
Agents communicate directly with each other without a central coordinator.
Agent A <---> Agent B
^ ^
| |
v v
Agent C <---> Agent D
Used by: OpenAI Swarm (experimental), A2A protocol
# A2A-based swarm
# Each agent publishes an Agent Card
# Agents discover each other and delegate directly
research_card = {
"name": "researcher",
"url": "https://agents.example.com/researcher",
"skills": [{"id": "web-research", "name": "Web Research"}]
}
# Any agent can send a task to any other agent
task = a2a_client.send_task(
agent_url="https://agents.example.com/researcher",
message="Research rate limiting patterns for REST APIs"
)
# Poll or stream for results
result = a2a_client.get_task(task.id)
When to use: Systems where agents are built by different teams/vendors, cross-organization collaboration, microservice-like agent architectures.
Trade-off: Hard to debug. No central place to see what’s happening.
Pattern 5: Pipeline
Agents process work in sequence, each transforming the output for the next.
Input -> Parse Agent -> Validate Agent -> Transform Agent -> Output Agent -> Result
# Pipeline pattern
pipeline = [
Agent(name="parser", instructions="Parse the raw input into structured data"),
Agent(name="validator", instructions="Validate the structured data"),
Agent(name="enricher", instructions="Enrich with additional context"),
Agent(name="formatter", instructions="Format for the target system"),
]
result = input_data
for agent in pipeline:
result = run_agent(agent, result)
When to use: ETL-like workflows, content processing, data transformation chains.
Trade-off: Linear - one slow agent blocks the whole pipeline.
Real-World Examples
Claude Code’s Architecture
Claude Code uses Pattern 2 (parallel subagents) internally:
Main Agent (your conversation)
|
|-- Read tools (auto-approved)
|-- Write tools (need approval)
|-- Bash tools (need approval)
|
|-- Subagent: "Explore" (read-only, fast, for codebase search)
|-- Subagent: "Plan" (read-only, for design work)
|-- Subagent: "General" (full tools, for complex subtasks)
Each subagent gets a fresh context window and returns a summary. The main agent integrates results.
GitHub Copilot Coding Agent
Copilot uses Pattern 3 (supervisor) in its coding agent:
GitHub Issue assigned to Copilot
|
v
Copilot Agent (supervisor)
|-- Reads codebase (MCP tools)
|-- Plans changes
|-- Implements changes
|-- Runs tests
|-- Self-corrects on failure (loops back)
|-- Creates PR
OpenAI Customer Support Example
Classic handoff pattern from OpenAI’s docs:
triage = Agent(name="Triage", handoffs=[billing, technical, general])
billing = Agent(name="Billing", tools=[lookup_invoice, process_refund])
technical = Agent(name="Technical", tools=[check_status, restart_service])
general = Agent(name="General", tools=[search_faq])
Choosing a Pattern
| Pattern | Latency | Complexity | Best for |
|---|---|---|---|
| Handoff | Medium | Low | Routing, specialist delegation |
| Parallel | Low | Medium | Independent subtasks |
| Supervisor | High | High | Quality-critical workflows |
| Swarm | Variable | Very High | Cross-vendor agent networks |
| Pipeline | High | Low | Sequential transformations |
Common Pitfalls
- Over-engineering - start with one agent. Add more only when you hit limits.
- Context loss - handoffs and subagents lose context. Be explicit about what to pass.
- Infinite loops - Agent A delegates to Agent B, which delegates back to Agent A. Always set recursion limits.
- Cost explosion - 10 parallel agents each making 5 LLM calls = 50 API calls per user request. Monitor costs.
- Debugging - multi-agent systems are hard to debug. Use tracing (OpenAI) or structured logging from day one.