Agentic CLI Architecture Reference

A comprehensive guide to building production-grade agentic command-line interfaces, derived from patterns in state-of-the-art implementations.

Core Architecture

Layered Design

┌─────────────────────────────────────────────────────────────┐
│                    EXTENSION LAYER                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ External │ │  Hooks   │ │  Skills  │ │ Plugins  │       │
│  │  Tools   │ │ (Events) │ │ (Prompts)│ │(Packages)│       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘       │
├─────────────────────────────────────────────────────────────┤
│                   DELEGATION LAYER                          │
│         ┌─────────────────────────────────────────┐        │
│         │     Subagents (Parallel Execution)      │        │
│         │   Explorer | Planner | Specialist       │        │
│         └─────────────────────────────────────────┘        │
├─────────────────────────────────────────────────────────────┤
│                      CORE LAYER                             │
│    ┌─────────────────────────────────────────────────────┐ │
│    │           Main Agent Loop                           │ │
│    │   Context Window | Tool Executor | Message History  │ │
│    └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                   FOUNDATION LAYER                          │
│    ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│    │   LLM    │ │ Session  │ │Permission│ │  Config  │    │
│    │ Provider │ │  Store   │ │  System  │ │  Loader  │    │
│    └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
└─────────────────────────────────────────────────────────────┘

Execution Model: Single-Thread Simplicity

Prefer a single main agent loop over complex multi-agent orchestration.

┌─────────────────────────────────────────────────────────┐
│                    MAIN AGENT LOOP                      │
│                                                         │
│   User Input → LLM → Tool Call → Result → LLM → ...    │
│                          │                              │
│                          ▼                              │
│                   ┌─────────────┐                       │
│                   │  Subagent   │ (max 1 level deep)    │
│                   │  Execution  │                       │
│                   └──────┬──────┘                       │
│                          │                              │
│                   Result added to                       │
│                   main message history                  │
└─────────────────────────────────────────────────────────┘

Key Principles:

Flat message list as single source of truth
Subagents spawn with isolated context, return summarized results
Maximum one level of delegation (no recursive subagent spawning)
Results from subagents become tool responses in main thread

Tool System Design

Tool Hierarchy

Design tools at multiple abstraction levels:

Level	Characteristics	Examples
Low-level	Direct system access, flexible but error-prone	`bash`, `read_file`, `write_file`
Mid-level	Specialized, optimized for common operations	`grep`, `glob`, `edit`, `multi_edit`
High-level	Orchestration, deterministic outcomes	`spawn_agent`, `web_fetch`, `todo_list`

Why multiple levels?

Frequent operations deserve dedicated tools (reduces LLM errors)
Specialized tools have better prompts and validation
High-level tools save tokens and keep agent on track

Tool Categories

1. Filesystem Tools

read:
  description: "Read file contents with line numbers"
  parameters:
    - file_path: string (required)
    - offset: integer (optional, start line)
    - limit: integer (optional, max lines)
  risk_level: low

write:
  description: "Create or overwrite file"
  parameters:
    - file_path: string (required)
    - content: string (required)
  risk_level: high

edit:
  description: "Precise search-and-replace modification"
  parameters:
    - file_path: string (required)
    - old_text: string (required, must be unique)
    - new_text: string (required)
  risk_level: medium

multi_edit:
  description: "Batch edits in single operation"
  parameters:
    - file_path: string (required)
    - edits: array of {old_text, new_text}
  risk_level: medium

glob:
  description: "Find files by pattern"
  parameters:
    - pattern: string (required, e.g., "**/*.py")
    - path: string (optional, search root)
  risk_level: low

list_directory:
  description: "List directory contents"
  parameters:
    - path: string (required)
    - depth: integer (optional, default 1)
  risk_level: low

2. Search Tools

grep:
  description: "Content search using ripgrep"
  parameters:
    - pattern: string (required)
    - path: string (optional)
    - output_mode: enum [content, files_only, count]
    - file_type: string (optional, e.g., "py", "js")
    - context_lines: integer (optional)
  risk_level: low
  notes: "Always prefer dedicated grep over bash grep"

3. Execution Tools

bash:
  description: "Execute shell commands"
  parameters:
    - command: string (required)
    - timeout: integer (optional, ms, max 600000)
    - working_directory: string (optional)
    - run_in_background: boolean (optional)
  risk_level: high
  restrictions:
    - "Prefer dedicated tools over bash equivalents"
    - "Never use: cat, head, tail, grep, find, sed, awk"
    - "Quote paths with spaces"
    - "Use && or ; for command chaining, not newlines"

4. Web Tools

web_search:
  description: "Search the internet"
  parameters:
    - query: string (required)
    - max_results: integer (optional)
  risk_level: low

web_fetch:
  description: "Retrieve web page content"
  parameters:
    - url: string (required)
    - extract_text: boolean (optional)
  risk_level: low

5. Meta Tools

spawn_agent:
  description: "Delegate task to subagent"
  parameters:
    - task: string (required)
    - agent_type: enum [explorer, planner, general]
    - tools: array of strings (optional, tool subset)
  risk_level: low

todo_list:
  description: "Track task progress"
  parameters:
    - action: enum [read, write, update]
    - todos: array of {content, status}
  risk_level: low

Tool Design Best Practices

Explicit over implicit: Require paths, don’t assume cwd
Validation in schema: Use JSON Schema for parameter validation
Rich descriptions: Include examples, edge cases, and anti-patterns
Atomic operations: Each tool does one thing well
Idempotent when possible: Same input → same output
Bounded outputs: Truncate long results (e.g., 30k chars)

Permission System

Permission Modes

modes:
  strict:
    description: "Confirm every action"
    auto_approve: []

  standard:
    description: "Approve reads, confirm writes"
    auto_approve: [read, glob, grep, list_directory, web_search]

  permissive:
    description: "Auto-approve safe operations"
    auto_approve: [read, glob, grep, list_directory, edit, web_*]

  autonomous:
    description: "No confirmations (dangerous)"
    auto_approve: ["*"]
    requires_flag: "--dangerously-skip-permissions"

Tool Allowlists

{
  "allowedTools": [
    "read",
    "glob",
    "grep",
    "bash(git:*)",
    "bash(npm:*)",
    "bash(pytest:*)"
  ],
  "disallowedTools": [
    "bash(rm -rf:*)",
    "bash(sudo:*)"
  ]
}

Pattern syntax:

tool_name - exact match
tool_name(prefix:*) - match commands starting with prefix
* - match all tools

Subagent Architecture

Built-in Agent Types

explorer:
  model: fast (e.g., Haiku, GPT-4o-mini)
  mode: read-only
  tools: [glob, grep, read, list_directory]
  parameters:
    thoroughness: [quick, medium, thorough]
  use_cases:
    - Codebase exploration
    - File discovery
    - Pattern searching

planner:
  model: capable (e.g., Sonnet, GPT-4o)
  mode: read-only
  tools: [read, glob, grep, bash(safe)]
  use_cases:
    - Implementation planning
    - Architecture decisions
    - Breaking down complex tasks

general:
  model: capable
  mode: full
  tools: all
  use_cases:
    - Complex research + modification
    - Multi-step workflows
    - Delegated implementations

Custom Agent Definition

# .agents/security-reviewer.yaml
name: security-reviewer
description: "Security-focused code reviewer"
model: opus  # or specific model string
permission_mode: plan  # read-only
tools:
  - read
  - grep
  - glob
  - bash(git log:*)
  - bash(git diff:*)
prompt: |
  You are a senior security engineer reviewing code.
  Focus on:
  - OWASP Top 10 vulnerabilities
  - Secrets and hardcoded credentials
  - SQL injection, XSS, CSRF
  - Authentication/authorization flaws

  Report findings with severity levels and remediation steps.

Subagent Invocation

# Programmatic
result = await spawn_agent(
    task="Find all authentication-related files",
    agent_type="explorer",
    thoroughness="thorough"
)

# Natural language (agent decides)
"Use the explorer agent to find all API endpoints"
"Have a subagent analyze the database schema"

Hook System

Event Types

lifecycle:
  - SessionStart      # Agent session begins
  - SessionEnd        # Agent session ends
  - Stop              # Agent completes task

tool_events:
  - PreToolUse        # Before tool execution
  - PostToolUse       # After successful execution
  - PostToolUseFailure # After failed execution
  - PermissionRequest # User permission needed

user_events:
  - UserPromptSubmit  # Before processing user input
  - Notification      # System notifications

Hook Configuration

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "bash",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/validate-bash.sh"
          }
        ]
      },
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/log-tool-use.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "edit",
        "hooks": [
          {
            "type": "command",
            "command": "npm run lint --fix"
          }
        ]
      }
    ]
  }
}

Hook Input/Output

# Hook receives JSON on stdin
{
  "tool_name": "bash",
  "tool_input": {
    "command": "rm -rf /tmp/cache"
  },
  "session_id": "abc123",
  "conversation_id": "xyz789"
}

# Hook returns JSON on stdout
{
  "decision": "deny",  # or "approve", "passthrough"
  "reason": "Dangerous command blocked",
  "modified_input": null,  # optional: modify tool input
  "system_message": null   # optional: inject context
}

Common Hook Patterns

# Security validation
async def validate_bash(input_data):
    command = input_data["tool_input"].get("command", "")
    dangerous = ["rm -rf /", "sudo", "> /dev/sda"]
    if any(d in command for d in dangerous):
        return {"decision": "deny", "reason": "Dangerous command"}
    return {"decision": "passthrough"}

# Audit logging
async def log_all_tools(input_data):
    log.info(f"Tool: {input_data['tool_name']}, Input: {input_data['tool_input']}")
    return {}

# Auto-formatting after edits
async def post_edit_format(input_data):
    if input_data["tool_name"] == "edit":
        file_path = input_data["tool_input"]["file_path"]
        if file_path.endswith(".py"):
            subprocess.run(["black", file_path])
    return {}

Skills System

Concept

Skills are prompt injection modules that extend agent capabilities through specialized instructions, not code execution.

┌─────────────────────────────────────────────────────────┐
│                   SKILL ACTIVATION                      │
│                                                         │
│   User Request → Skill Matcher → Inject SKILL.md →     │
│                                                         │
│   → Agent now has specialized instructions/context      │
└─────────────────────────────────────────────────────────┘

Skill Definition

<!-- .skills/code-review/SKILL.md -->

---
name: code-review
description: "Thorough code review with best practices"
tools: Read,Grep,Glob
---

# Code Review Skill

When performing code reviews, follow this methodology:

## 1. Understanding Phase
- Read the changed files completely
- Identify the purpose of changes
- Check for related test files

## 2. Analysis Checklist
- [ ] Error handling present
- [ ] Edge cases considered
- [ ] No hardcoded secrets
- [ ] Logging appropriate
- [ ] Tests cover new code

## 3. Output Format
Provide feedback as:
- 🔴 Critical: Must fix before merge
- 🟡 Suggestion: Consider improving
- 🟢 Praise: Well done

Skill Discovery

project/
├── .skills/
│   ├── code-review/
│   │   └── SKILL.md
│   └── api-design/
│       └── SKILL.md
└── ~/.config/agent/skills/  # Global skills
    └── security-audit/
        └── SKILL.md

Plugin Architecture

Plugin Structure

my-plugin/
├── plugin.json           # Manifest (required)
├── commands/             # Slash commands
│   └── deploy.md
├── agents/               # Custom agents
│   └── devops.md
├── skills/               # Skills
│   └── kubernetes/
│       └── SKILL.md
├── hooks.json            # Hook definitions
└── mcp.json              # External tool servers

Plugin Manifest

{
  "name": "devops-toolkit",
  "version": "1.0.0",
  "description": "DevOps automation tools",
  "author": "Your Name",
  "components": {
    "commands": ["commands/*.md"],
    "agents": ["agents/*.md"],
    "skills": ["skills/*/SKILL.md"],
    "hooks": "hooks.json",
    "mcp": "mcp.json"
  }
}

Plugin Loading

# Programmatic
agent = Agent(
    plugins=[
        {"type": "local", "path": "./my-plugin"},
        {"type": "local", "path": "~/.agent/plugins/shared"}
    ]
)

# CLI
agent --plugin-dir ./my-plugin --plugin-dir ~/.agent/plugins/shared

External Tool Protocol (MCP)

Overview

Model Context Protocol enables connecting external services as tools.

┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│   Agent     │ ←──→ │ MCP Client  │ ←──→ │ MCP Server  │
│   Core      │      │             │      │  (External) │
└─────────────┘      └─────────────┘      └─────────────┘
                                                │
                            ┌───────────────────┼───────────────────┐
                            │                   │                   │
                            ▼                   ▼                   ▼
                      ┌──────────┐       ┌──────────┐       ┌──────────┐
                      │ Database │       │   API    │       │  Browser │
                      └──────────┘       └──────────┘       └──────────┘

Transport Types

stdio:
  description: "Spawn process, communicate via stdin/stdout"
  config:
    command: "npx"
    args: ["@mcp/server-filesystem"]
    env:
      ALLOWED_PATHS: "/home/user/projects"

sse:
  description: "Server-Sent Events over HTTP"
  config:
    url: "https://api.example.com/mcp/sse"
    headers:
      Authorization: "Bearer ${API_TOKEN}"

http:
  description: "Standard HTTP requests"
  config:
    url: "https://api.example.com/mcp"
    headers:
      X-API-Key: "${API_KEY}"

MCP Configuration

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["@mcp/server-filesystem"],
      "env": {
        "ALLOWED_PATHS": "/home/user/projects"
      }
    },
    "database": {
      "type": "sse",
      "url": "http://localhost:3001/mcp",
      "headers": {
        "Authorization": "Bearer ${DB_TOKEN}"
      }
    },
    "jira": {
      "command": "python",
      "args": ["-m", "mcp_jira"],
      "env": {
        "JIRA_URL": "${JIRA_URL}",
        "JIRA_TOKEN": "${JIRA_TOKEN}"
      }
    }
  }
}

Tool Naming Convention

MCP tools follow: mcp__{server_name}__{tool_name}

allowed_tools = [
    "mcp__filesystem__read_file",
    "mcp__database__query",
    "mcp__jira__create_issue"
]

Context Management

The Context Window Problem

┌─────────────────────────────────────────────────────────┐
│                  CONTEXT WINDOW                         │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │ System Prompt                            ~2-5k  │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Tool Definitions                         ~3-8k  │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Project Context (CLAUDE.md, etc.)        ~1-5k  │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Conversation History                  VARIABLE  │   │
│  │   - User messages                               │   │
│  │   - Assistant responses                         │   │
│  │   - Tool calls and results                      │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Current Turn                           VARIABLE │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  Total limit: 128k-200k tokens (model dependent)       │
└─────────────────────────────────────────────────────────┘

Compaction Strategies

class ContextManager:
    def __init__(self, max_tokens: int = 100000):
        self.max_tokens = max_tokens
        self.compaction_threshold = 0.8  # 80%

    def should_compact(self, current_tokens: int) -> bool:
        return current_tokens > self.max_tokens * self.compaction_threshold

    def compact(self, messages: list) -> list:
        """Reduce context size while preserving critical information."""
        strategies = [
            self.truncate_tool_outputs,    # Limit long outputs
            self.summarize_old_turns,      # Summarize distant history
            self.remove_redundant_reads,   # Remove duplicate file reads
            self.compress_to_summary       # Last resort: full summarization
        ]

        for strategy in strategies:
            messages = strategy(messages)
            if self.count_tokens(messages) < self.max_tokens * 0.6:
                break

        return messages

Practical Techniques

Truncate tool outputs: Limit to 30k chars, show head + tail
Deduplicate file reads: Keep only latest version
Summarize old turns: Compress turns older than N
Subagent isolation: Subagents get fresh context, return summaries
Prompt caching: Cache static portions (system prompt, tools)

CLI Interface Design

Command Structure

# Basic usage
agent                           # Interactive REPL
agent "query"                   # Start with prompt
agent -p "query"                # Non-interactive (print mode)
agent -c                        # Continue last session
agent -r <session-id>           # Resume specific session

# Piping
cat file.log | agent -p "analyze errors"
git diff | agent -p "review changes"

# Configuration
agent --model sonnet            # Model selection
agent --permission-mode strict  # Permission mode
agent --max-turns 10            # Limit iterations
agent --timeout 300             # Global timeout (seconds)

# Extensions
agent --mcp-config ./mcp.json   # Load MCP servers
agent --plugin-dir ./plugins    # Load plugins
agent --agents '{"name": {...}}'  # Define subagents

# System prompt
agent --system-prompt "You are..."        # Replace
agent --append-system-prompt "Also..."    # Append
agent --system-prompt-file ./prompt.txt   # From file

# Output control
agent -p --output-format json   # JSON output
agent -p --output-format stream-json  # Streaming JSON
agent --verbose                 # Detailed logging
agent --debug "api,tools"       # Debug categories

Interactive Commands (Slash Commands)

/help                    Show available commands
/clear                   Reset conversation
/compact                 Compress context manually
/model <name>            Switch model
/permissions             Manage tool permissions
/sessions                List saved sessions
/resume <id>             Resume session
/save                    Save current session
/config                  View/edit configuration
/bug                     Report issue
/quit                    Exit

Custom Slash Commands

<!-- .commands/deploy.md -->
Deploy the current project to $ARGUMENTS environment.

1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh $ARGUMENTS`
4. Verify deployment health
5. Report status

If any step fails, stop and report the error.

Usage: /project:deploy staging

Configuration Hierarchy

Precedence (highest to lowest)

1. CLI flags           --model opus
2. Environment vars    AGENT_MODEL=opus
3. Local config        ./.agent/settings.json
4. Project config      ./agent.config.json
5. User config         ~/.config/agent/settings.json
6. System defaults     Built-in values

Configuration File

{
  "model": "sonnet",
  "fallback_model": "haiku",
  "permission_mode": "standard",
  "max_turns": 50,
  "timeout_ms": 300000,

  "tools": {
    "allowed": ["read", "glob", "grep", "bash(git:*)"],
    "disallowed": ["bash(rm -rf:*)"]
  },

  "context": {
    "max_tokens": 100000,
    "compaction_threshold": 0.8
  },

  "hooks": {
    "PreToolUse": [...]
  },

  "mcp_servers": {
    "filesystem": {...}
  },

  "output": {
    "format": "text",
    "verbose": false,
    "color": true
  }
}

Project Context File

<!-- AGENT.md or CLAUDE.md -->

# Project: My Application

## Overview
This is a Next.js application with a Python backend.

## Architecture
- Frontend: Next.js 14, TypeScript, Tailwind
- Backend: FastAPI, PostgreSQL
- Infrastructure: Docker, Kubernetes

## Conventions
- Use TypeScript strict mode
- Follow PEP 8 for Python
- All functions require docstrings
- Tests required for new features

## Commands
- `npm run dev` - Start frontend
- `uvicorn main:app --reload` - Start backend
- `pytest` - Run tests
- `npm run lint && black .` - Format code

## File Structure
- `src/` - Frontend source
- `api/` - Backend source
- `tests/` - Test files
- `docs/` - Documentation

Session Management

Session Persistence

@dataclass
class Session:
    id: str
    created_at: datetime
    updated_at: datetime
    working_directory: str
    messages: list[Message]
    tool_states: dict  # Persistent tool state
    checkpoints: list[Checkpoint]
    metadata: dict

class SessionStore:
    def save(self, session: Session) -> None:
        """Persist session to disk."""
        path = self.sessions_dir / f"{session.id}.json"
        path.write_text(session.to_json())

    def load(self, session_id: str) -> Session:
        """Load session from disk."""
        path = self.sessions_dir / f"{session_id}.json"
        return Session.from_json(path.read_text())

    def list_recent(self, limit: int = 10) -> list[SessionSummary]:
        """List recent sessions."""
        sessions = sorted(
            self.sessions_dir.glob("*.json"),
            key=lambda p: p.stat().st_mtime,
            reverse=True
        )
        return [self._summarize(s) for s in sessions[:limit]]

Checkpointing

class Checkpoint:
    """Snapshot of agent state at a point in time."""
    id: str
    session_id: str
    timestamp: datetime
    message_index: int
    file_snapshots: dict[str, str]  # path -> content hash
    description: str

def create_checkpoint(session: Session, description: str) -> Checkpoint:
    """Create restorable checkpoint."""
    return Checkpoint(
        id=generate_id(),
        session_id=session.id,
        timestamp=datetime.now(),
        message_index=len(session.messages),
        file_snapshots=snapshot_working_files(session.working_directory),
        description=description
    )

def restore_checkpoint(checkpoint: Checkpoint) -> Session:
    """Restore session to checkpoint state."""
    session = load_session(checkpoint.session_id)
    session.messages = session.messages[:checkpoint.message_index]
    restore_files(checkpoint.file_snapshots)
    return session

SDK Design

Core API

from agent_sdk import Agent, AgentOptions, Message

# Streaming execution
async for message in Agent.query(
    prompt="Fix the bug in auth.py",
    options=AgentOptions(
        model="sonnet",
        allowed_tools=["read", "edit", "bash(pytest:*)"],
        permission_mode="accept_edits",
        system_prompt="You are a Python expert.",
        working_directory="/path/to/project",
        mcp_servers={"db": {...}},
        hooks={"PreToolUse": [validate_hook]}
    )
):
    match message:
        case Message(type="assistant"):
            print(message.content)
        case Message(type="tool_use"):
            print(f"Using {message.tool_name}")
        case Message(type="tool_result"):
            print(f"Result: {message.result[:100]}")
        case Message(type="result", subtype="success"):
            print("Task completed!")

# Single-turn execution
result = await Agent.run(
    prompt="List all Python files",
    options=AgentOptions(allowed_tools=["glob"])
)
print(result.output)

Message Types

@dataclass
class Message:
    type: Literal[
        "system",        # System events (init, error)
        "user",          # User input
        "assistant",     # Agent response
        "tool_use",      # Tool invocation
        "tool_result",   # Tool output
        "result"         # Final result
    ]
    subtype: str | None  # e.g., "success", "error", "init"
    content: Any
    metadata: dict

Custom Tool Definition

from agent_sdk import tool, ToolResult

@tool(
    name="query_database",
    description="Execute SQL query against the database",
    parameters={
        "query": {"type": "string", "description": "SQL query to execute"},
        "database": {"type": "string", "description": "Database name"}
    }
)
async def query_database(query: str, database: str) -> ToolResult:
    try:
        result = await db.execute(query, database)
        return ToolResult.success(result.to_json())
    except DatabaseError as e:
        return ToolResult.error(f"Query failed: {e}")

Error Handling

Error Categories

class AgentError(Exception):
    """Base agent error."""
    pass

class ToolExecutionError(AgentError):
    """Tool failed to execute."""
    tool_name: str
    input: dict
    cause: Exception

class PermissionDeniedError(AgentError):
    """User denied permission."""
    tool_name: str
    input: dict

class ContextOverflowError(AgentError):
    """Context window exceeded."""
    current_tokens: int
    max_tokens: int

class ModelError(AgentError):
    """LLM provider error."""
    provider: str
    status_code: int
    message: str

class TimeoutError(AgentError):
    """Operation timed out."""
    operation: str
    timeout_ms: int

Retry Strategies

class RetryPolicy:
    max_retries: int = 3
    backoff_base: float = 1.0
    backoff_max: float = 30.0
    retryable_errors: set = {
        "rate_limit",
        "overloaded",
        "timeout",
        "connection_error"
    }

    def should_retry(self, error: AgentError, attempt: int) -> bool:
        if attempt >= self.max_retries:
            return False
        return error.code in self.retryable_errors

    def get_delay(self, attempt: int) -> float:
        delay = self.backoff_base * (2 ** attempt)
        return min(delay, self.backoff_max)

Observability

Logging

import structlog

logger = structlog.get_logger()

# Tool execution
logger.info("tool_execution",
    tool=tool_name,
    input=tool_input,
    duration_ms=duration,
    success=True
)

# LLM call
logger.info("llm_call",
    model=model_name,
    input_tokens=input_tokens,
    output_tokens=output_tokens,
    duration_ms=duration,
    cached_tokens=cached_tokens
)

# Session events
logger.info("session_event",
    event="created" | "resumed" | "completed" | "error",
    session_id=session_id,
    duration_total_ms=total_duration
)

Metrics

from prometheus_client import Counter, Histogram

tool_calls = Counter(
    "agent_tool_calls_total",
    "Total tool calls",
    ["tool_name", "status"]
)

tool_duration = Histogram(
    "agent_tool_duration_seconds",
    "Tool execution duration",
    ["tool_name"]
)

llm_tokens = Counter(
    "agent_llm_tokens_total",
    "Total LLM tokens",
    ["model", "type"]  # type: input, output, cached
)

session_duration = Histogram(
    "agent_session_duration_seconds",
    "Session duration"
)

Cost Tracking

@dataclass
class UsageTracker:
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
    tool_calls: int = 0

    def add_llm_call(self, response: LLMResponse):
        self.input_tokens += response.usage.input_tokens
        self.output_tokens += response.usage.output_tokens
        self.cached_tokens += response.usage.cached_tokens

    def estimate_cost(self, pricing: dict) -> float:
        input_cost = self.input_tokens * pricing["input_per_token"]
        output_cost = self.output_tokens * pricing["output_per_token"]
        cached_cost = self.cached_tokens * pricing["cached_per_token"]
        return input_cost + output_cost + cached_cost

Security Considerations

Sandboxing

# Docker-based isolation
docker_config:
  image: "agent-sandbox:latest"
  network: none  # or restricted
  read_only_root: true
  volumes:
    - source: /project
      target: /workspace
      read_only: false
    - source: /home/user/.agent
      target: /config
      read_only: true
  resource_limits:
    memory: 4g
    cpu: 2
    pids: 100
  security_opts:
    - no-new-privileges
    - seccomp=restricted.json

Input Validation

def validate_tool_input(tool_name: str, input: dict) -> ValidationResult:
    """Validate tool input before execution."""
    schema = get_tool_schema(tool_name)

    # JSON Schema validation
    errors = jsonschema.validate(input, schema)
    if errors:
        return ValidationResult.invalid(errors)

    # Path traversal check
    if "path" in input or "file_path" in input:
        path = input.get("path") or input.get("file_path")
        if not is_within_workspace(path):
            return ValidationResult.invalid("Path traversal detected")

    # Command injection check
    if tool_name == "bash":
        if contains_injection(input["command"]):
            return ValidationResult.invalid("Potential command injection")

    return ValidationResult.valid()

Secret Management

# Environment variable expansion (safe)
def expand_env_vars(config: dict) -> dict:
    """Expand ${VAR} patterns in config."""
    def expand(value: str) -> str:
        pattern = r'\$\{([^}]+)\}'
        def replace(match):
            var_name = match.group(1)
            default = None
            if ":-" in var_name:
                var_name, default = var_name.split(":-", 1)
            return os.environ.get(var_name, default or "")
        return re.sub(pattern, replace, value)

    return deep_map(config, expand)

Performance Optimization

Prompt Caching

class PromptCache:
    """Cache static prompt components."""

    def __init__(self):
        self.cache = {}

    def get_cached_prefix(self, components: list[str]) -> CachedPrefix:
        """Get or create cached prefix from components."""
        key = hash(tuple(components))
        if key not in self.cache:
            self.cache[key] = self._create_prefix(components)
        return self.cache[key]

    def build_messages(
        self,
        system_prompt: str,
        tool_definitions: list,
        project_context: str,
        conversation: list[Message]
    ) -> list[Message]:
        """Build messages with cached prefix."""
        # Static components get cached
        prefix = self.get_cached_prefix([
            system_prompt,
            json.dumps(tool_definitions),
            project_context
        ])

        # Dynamic components appended
        return prefix + conversation

Parallel Tool Execution

async def execute_tools_batch(tool_calls: list[ToolCall]) -> list[ToolResult]:
    """Execute independent tool calls in parallel."""

    # Group by dependency
    independent = [t for t in tool_calls if not t.depends_on]
    dependent = [t for t in tool_calls if t.depends_on]

    # Execute independent calls in parallel
    results = await asyncio.gather(*[
        execute_tool(t) for t in independent
    ])

    # Execute dependent calls sequentially
    for tool_call in dependent:
        result = await execute_tool(tool_call)
        results.append(result)

    return results

Streaming Output

async def stream_response(query: str, options: AgentOptions):
    """Stream agent responses for real-time display."""

    buffer = ""
    async for message in Agent.query(query, options):
        if message.type == "assistant":
            # Stream text token by token
            for token in message.content:
                yield token
                buffer += token

        elif message.type == "tool_use":
            yield f"\n[Using {message.tool_name}...]\n"

        elif message.type == "tool_result":
            if options.verbose:
                yield f"[Result: {message.result[:200]}...]\n"

Testing Strategies

Unit Testing Tools

import pytest
from agent_sdk.testing import MockLLM, MockToolExecutor

@pytest.fixture
def mock_agent():
    return Agent(
        llm=MockLLM(responses=[
            {"tool_use": {"name": "read", "input": {"path": "test.py"}}},
            {"text": "The file contains a function definition."}
        ]),
        tool_executor=MockToolExecutor({
            "read": lambda input: "def hello(): pass"
        })
    )

async def test_file_analysis(mock_agent):
    result = await mock_agent.run("Analyze test.py")
    assert "function" in result.output
    assert mock_agent.tool_calls == [("read", {"path": "test.py"})]

Integration Testing

@pytest.mark.integration
async def test_edit_workflow():
    """Test complete edit workflow in isolated environment."""

    with TempWorkspace() as workspace:
        # Setup
        workspace.write("src/main.py", "def old_name(): pass")

        # Execute
        agent = Agent(working_directory=workspace.path)
        await agent.run("Rename old_name to new_name in src/main.py")

        # Verify
        content = workspace.read("src/main.py")
        assert "def new_name():" in content
        assert "old_name" not in content

Snapshot Testing

def test_tool_schema_stability():
    """Ensure tool schemas don't change unexpectedly."""
    tools = get_all_tool_definitions()

    for tool in tools:
        snapshot_path = f"snapshots/tools/{tool.name}.json"
        current = tool.to_json()

        if os.path.exists(snapshot_path):
            expected = json.load(open(snapshot_path))
            assert current == expected, f"Tool {tool.name} schema changed"
        else:
            json.dump(current, open(snapshot_path, "w"))

Deployment Patterns

Docker Container

FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    ripgrep \
    && rm -rf /var/lib/apt/lists/*

# Install agent
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent

# Copy configuration
COPY --chown=agent:agent config/ /home/agent/.config/agent/

ENTRYPOINT ["agent"]
CMD ["--help"]

CI/CD Integration

# GitHub Actions example
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT

      - name: Run AI Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          agent -p "Review these changed files for bugs and improvements: ${{ steps.changes.outputs.files }}" \
            --output-format json \
            --max-turns 20 \
            > review.json

      - name: Post Review Comment
        uses: actions/github-script@v7
        with:
          script: |
            const review = require('./review.json');
            github.rest.pulls.createReview({
              ...context.repo,
              pull_number: context.issue.number,
              body: review.output,
              event: 'COMMENT'
            });

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      containers:
        - name: agent
          image: my-agent:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "2000m"
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-keys
                  key: anthropic
          volumeMounts:
            - name: workspace
              mountPath: /workspace
      volumes:
        - name: workspace
          emptyDir: {}

Appendix: Decision Checklist

When to Use Dedicated Tools vs Bash

Use Dedicated Tool	Use Bash
File reading (read)	Git operations
File editing (edit)	Package management (npm, pip)
Pattern search (grep)	Running tests
File finding (glob)	Build commands
Directory listing (ls)	Custom scripts

When to Spawn Subagents

Spawn Subagent	Handle in Main Loop
Complex research tasks	Simple file operations
Exploration of large codebases	Direct answers
Parallel independent tasks	Sequential dependent tasks
Isolated experimental changes	Standard workflows

Permission Mode Selection

Mode	Use Case
Strict	Untrusted environments, production systems
Standard	Normal development work
Permissive	Trusted projects, experienced users
Autonomous	Automated pipelines, sandboxed environments

This document provides architectural patterns for building agentic CLI tools. Implementations will vary based on specific requirements, target LLM providers, and use cases.

PUBLISHED ON JAN 3, 2025 — AI, RESEARCH, SOFTWARE ENGINEERING