Agentic CLIs

How Claude Code, Copilot CLI, Aider, and others implement the agentic loop in the terminal.

The Landscape (March 2026)

CLICreatorModelOpen SourceKey Innovation
Claude CodeAnthropicClaudeYesSubagents, MCP, hooks, extended thinking
Copilot CLIGitHubMulti-modelNoGitHub integration, agent skills, fleet mode
AiderPaul GauthierMulti-modelYesGit-native, repo maps (98% token reduction)
OpenAI Codex CLIOpenAIOpenAI modelsYesOS-level sandboxing (Seatbelt/Landlock)
Gemini CLIGoogleGeminiYes1M token context window
CursorCursor IncMulti-modelNoIDE-native, composer agent, automations
WindsurfCodeiumMulti-modelNoCascade (multi-file agent, parallel panes)

Claude Code

The most transparent about its architecture - it’s open source.

Architecture

User Input
    |
    v
Claude Code Process
    |-- System prompt (CLAUDE.md, context, tools list)
    |-- Tool definitions (Read, Write, Edit, Bash, Grep, Glob, Agent, etc.)
    |-- Permission system (auto-allow, ask, deny)
    |
    v
Claude API (agentic loop)
    |-- Model decides action
    |-- Execute tool
    |-- Feed result back
    |-- Repeat until done

Tool System

Claude Code exposes ~15 built-in tools:

ToolPurposePermission
ReadRead filesAuto-approved
GlobFind files by patternAuto-approved
GrepSearch file contentsAuto-approved
EditEdit files (string replacement)Needs approval
WriteCreate/overwrite filesNeeds approval
BashExecute shell commandsNeeds approval
AgentSpawn subagentsAuto-approved
LSPLanguage server queriesAuto-approved

Permission Model

Three tiers:

  1. Auto-approved - read-only operations (Read, Glob, Grep)
  2. Approval required - write operations (Edit, Write, Bash)
  3. Allowlisted - specific bash commands pre-approved in settings

Users can configure via .claude/settings.json:

{
  "permissions": {
    "allow": ["Bash(npm test)", "Bash(git status)"],
    "deny": ["Bash(rm -rf *)"]
  }
}

Subagent Architecture

Claude Code spawns subagents for complex tasks:

Main conversation (full tool access)
    |
    |-- Agent(type="Explore")   -> read-only tools, fast
    |-- Agent(type="Plan")      -> read-only tools, for design
    |-- Agent(type="general")   -> all tools, for implementation

Each subagent gets a fresh context window. Results are returned as a summary to the main conversation.

Context Management

  • 200K token context window (Claude)
  • Automatic summarization when approaching limits
  • CLAUDE.md files loaded at start for project context
  • Git status and recent commits included as context

Hooks

Custom shell commands that run on events:

{
  "hooks": {
    "on_tool_call": {
      "Bash": "echo 'Bash command: $TOOL_INPUT' >> /tmp/audit.log"
    }
  }
}

GitHub Copilot CLI

GA as of February 2026. Terminal-native coding agent.

Architecture

User Input (gh copilot or standalone copilot)
    |
    v
Copilot CLI Engine (same as Copilot SDK)
    |-- GitHub MCP Server (repo access)
    |-- File system tools
    |-- Terminal execution
    |-- Agent skills (per-repo customization)
    |
    v
GitHub Models API (multi-model)

Key Features

  • Agent skills - per-repo instructions in .github/copilot/skills/
  • Session memory - remembers context across invocations
  • GitHub-native - deep integration with issues, PRs, Actions
  • Multi-model - GPT-4.1, GPT-5 mini, Claude (configurable)
  • MCP support - connect external tools via MCP servers

Agent Skills

Per-repo customization files:

<!-- .github/copilot/skills/deploy.md -->
# Deploy Skill

When asked to deploy:
1. Run `npm run build`
2. Run tests with `npm test`
3. Deploy with `./scripts/deploy.sh`
4. Verify at https://staging.example.com

Skills are loaded automatically when relevant to the user’s prompt.

Copilot Coding Agent

The autonomous agent that works in GitHub Actions:

Assign issue to Copilot -> Spins up GitHub Actions environment
    -> Reads codebase -> Plans changes -> Implements
    -> Runs tests -> Self-corrects -> Creates PR

Triggered from: GitHub Issues, VS Code, CLI, or MCP.


Aider

Open-source, model-agnostic coding CLI. Known for git-native workflow.

Architecture

User Input
    |
    v
Aider
    |-- Repo map (AST-based file index)
    |-- Git integration (auto-commit changes)
    |-- Edit format (whole file or diff)
    |
    v
LLM API (Claude, GPT, Gemini, local models)

Key Innovation: Repo Maps

Aider builds an AST-based map of the entire repo:

repo_map:
  src/auth.py:
    classes: [AuthManager, TokenStore]
    functions: [verify_token, refresh_token]
  src/api.py:
    classes: [APIRouter]
    functions: [handle_request, validate_input]

This lets the LLM understand the codebase structure without reading every file. The model asks for specific files when needed.

Git-Native Workflow

Every change Aider makes is automatically committed:

$ aider
> Fix the bug in auth.py

# Aider edits auth.py, then:
git add auth.py
git commit -m "fix: handle expired tokens in verify_token"

You can easily undo: git diff HEAD~1 or git revert HEAD.

Edit Formats

Aider supports multiple ways of applying changes:

FormatHow it worksBest for
Whole fileLLM outputs entire fileSmall files
DiffLLM outputs unified diffLarge files
Search/ReplaceLLM outputs search/replace blocksPrecise edits

OpenAI Codex CLI

Open-source agentic CLI with OS-level sandboxing. Different philosophy from Claude Code.

Architecture

User Input
    |
    v
Codex CLI
    |-- Sandbox (Seatbelt on macOS, Landlock/seccomp on Linux)
    |-- Tool system (file ops, shell, web search)
    |-- Multi-model (o3, o4-mini, GPT-4.1)
    |
    v
OpenAI API (agentic loop)

Key Innovation: OS-Level Sandboxing

Instead of application-level permission rules (like Claude Code’s allowlists), Codex uses kernel-level sandboxing:

  • macOS: Apple’s Seatbelt framework restricts filesystem access
  • Linux: Landlock LSM + seccomp-bpf for syscall filtering

Three modes:

  • Suggest - read-only, no execution
  • Auto-edit - can write files, no network, no shell
  • Full auto - network access within sandbox, shell execution

This is a fundamentally different approach. Claude Code trusts the model to ask permission; Codex trusts the operating system to enforce boundaries.


Gemini CLI

Google’s entry into agentic CLIs. Open source, leverages Gemini’s 1M token context.

Key Differentiator: Brute Force Context

Where Claude Code manages 200K tokens carefully (summarization, subagents), Gemini CLI just loads everything into a 1M token window. Different trade-off:

ApproachUsed byProsCons
Smart retrieval (200K)Claude Code, AiderCheaper, fasterMay miss context
Brute force (1M)Gemini CLI, CodexNever misses contextExpensive, slower

Uses GEMINI.md for project instructions (equivalent to CLAUDE.md).


Emerging Standards

AGENTS.md

An emerging cross-tool standard for repo-level agent instructions. Supported by Codex, Cursor, Copilot, and Windsurf. Each tool also has its own proprietary format:

ToolProprietaryCross-compatible
Claude CodeCLAUDE.md-
Gemini CLIGEMINI.md-
Codex CLI-AGENTS.md
Cursor.cursor/rulesAGENTS.md
Copilot.github/copilot-instructions.mdAGENTS.md
Windsurf.windsurfrulesAGENTS.md

Agent Skills

A cross-cutting standard for per-repo capabilities. SKILL.md files with YAML frontmatter:

---
name: deploy
description: Deploy the application to production
---

When asked to deploy:
1. Run `npm run build`
2. Run tests with `npm test`
3. Deploy with `./scripts/deploy.sh`

Works across Copilot (VS Code, CLI, coding agent) and Claude Code. Progressive loading: metadata for discovery, full body on match, supporting files on reference.


Two Permission Philosophies

The agentic CLI space has split into two camps:

OS-Level Sandboxing (Codex)

Kernel enforces boundaries
    |-- Process can't access files outside sandbox
    |-- Process can't make network calls (in restricted mode)
    |-- No reliance on model behavior

Pros: Stronger guarantees, can’t be prompt-injected around. Cons: Coarse-grained (all-or-nothing per capability), OS-specific.

Application-Level Rules (Claude Code)

Application enforces boundaries
    |-- Pattern matching on tool calls (Bash(npm test) = allowed)
    |-- 8 lifecycle hooks for custom logic
    |-- Wildcard permissions (mcp__github__*)

Pros: Fine-grained, programmable, cross-platform. Cons: Runs in same process, theoretically bypassable.

Neither approach is “right.” Codex is safer for untrusted environments; Claude Code is more flexible for power users.


Architecture Comparison

The Agentic Loop

All CLIs implement the same core loop, but with different emphases:

AspectClaude CodeCopilot CLICodex CLIAider
Loop controlLLM-drivenLLM-drivenLLM-drivenLLM-driven
Tool granularityFine (Read, Edit, Grep separately)Coarse (actions)Medium (file ops, shell)Medium (edit, run)
Self-correctionYes (sees errors, retries)Yes (monitors terminal)Yes (in sandbox)Yes (sees lint/test errors)
ParallelismSubagentsFleet modeSingle threadSingle thread
Context strategySummarization + subagentsSession memoryBrute force (large context)Repo map + selective loading

Permission Models

CLIReadWriteExecuteApproach
Claude CodeAutoAskAskApp-level rules + hooks
Copilot CLIAutoReviewApproveApp-level + skills
Codex CLIAutoMode-dependentMode-dependentOS-level sandbox
Gemini CLIAutoAskAskApp-level rules
AiderAutoAuto-commitAskMinimal
CursorAutoReview (diff)AskIDE settings

Context Management

CLIStrategyMax Context
Claude CodeAuto-summarization + subagents200K tokens
Gemini CLIBrute force loading1M tokens
Copilot CLISession memory + skillsModel-dependent
AiderRepo map (tree-sitter AST + NetworkX graphs, 98% token reduction)Model-dependent
CursorCodebase embedding index + @ referencesModel-dependent

Choosing a CLI

If you need…Use
Transparent, hackable architectureClaude Code
GitHub ecosystem integrationCopilot CLI
Strongest sandboxing guaranteesCodex CLI
Largest context windowGemini CLI (1M) or Claude Code (200K)
Git-native workflow, model-agnosticAider
IDE integration with visual diffsCursor or Windsurf
Full autonomy on issuesCopilot Coding Agent
Team/enterprise featuresCopilot CLI