Agentic CLI Architecture Reference
A comprehensive guide to building production-grade agentic command-line interfaces, derived from patterns in state-of-the-art implementations.
Core Architecture
Layered Design
Execution Model: Single-Thread Simplicity
Prefer a single main agent loop over complex multi-agent orchestration.
Key Principles:
- Flat message list as single source of truth
- Subagents spawn with isolated context, return summarized results
- Maximum one level of delegation (no recursive subagent spawning)
- Results from subagents become tool responses in main thread
Tool System Design
Tool Hierarchy
Design tools at multiple abstraction levels:
| Level | Characteristics | Examples |
|-------|----------------|----------|
| Low-level | Direct system access, flexible but error-prone | bash, read_file, write_file |
| Mid-level | Specialized, optimized for common operations | grep, glob, edit, multi_edit |
| High-level | Orchestration, deterministic outcomes | spawn_agent, web_fetch, todo_list |
Why multiple levels?
- Frequent operations deserve dedicated tools (reduces LLM errors)
- Specialized tools have better prompts and validation
- High-level tools save tokens and keep agent on track
Tool Categories
1. Filesystem Tools
Skill Definition
<!-- .skills/code-review/SKILL.md -->
---
name: code-review
description: "Thorough code review with best practices"
tools: Read,Grep,Glob
---
# Code Review Skill
When performing code reviews, follow this methodology:
## 1. Understanding Phase
- Read the changed files completely
- Identify the purpose of changes
- Check for related test files
## 2. Analysis Checklist
- [ ] Error handling present
- [ ] Edge cases considered
- [ ] No hardcoded secrets
- [ ] Logging appropriate
- [ ] Tests cover new code
## 3. Output Format
Provide feedback as:
- 🔴 Critical: Must fix before merge
- 🟡 Suggestion: Consider improving
- 🟢 Praise: Well done
```text
### Skill Discovery
```text
project/
├── .skills/
│ ├── code-review/
│ │ └── SKILL.md
│ └── api-design/
│ └── SKILL.md
└── ~/.config/agent/skills/ # Global skills
└── security-audit/
└── SKILL.md
```text
---
## Plugin Architecture
### Plugin Structure
```text
my-plugin/
├── plugin.json # Manifest (required)
├── commands/ # Slash commands
│ └── deploy.md
├── agents/ # Custom agents
│ └── devops.md
├── skills/ # Skills
│ └── kubernetes/
│ └── SKILL.md
├── hooks.json # Hook definitions
└── mcp.json # External tool servers
```text
### Plugin Manifest
```json
{
"name": "devops-toolkit",
"version": "1.0.0",
"description": "DevOps automation tools",
"author": "Your Name",
"components": {
"commands": ["commands/*.md"],
"agents": ["agents/*.md"],
"skills": ["skills/*/SKILL.md"],
"hooks": "hooks.json",
"mcp": "mcp.json"
}
}
```text
### Plugin Loading
```python
# Programmatic
agent = Agent(
plugins=[
{"type": "local", "path": "./my-plugin"},
{"type": "local", "path": "~/.agent/plugins/shared"}
]
)
# CLI
agent --plugin-dir ./my-plugin --plugin-dir ~/.agent/plugins/shared
```text
---
## External Tool Protocol (MCP)
### Overview
Model Context Protocol enables connecting external services as tools.
```text
graph LR
AC["Agent Core"] <--> MC["MCP Client"] <--> MS["MCP Server\n(External)"]
MS --> DB["Database"]
MS --> API["API"]
MS --> BR["Browser"]
Transport Types
stdio:
description: "Spawn process, communicate via stdin/stdout"
config:
command: "npx"
args: ["@mcp/server-filesystem"]
env:
ALLOWED_PATHS: "/home/user/projects"
sse:
description: "Server-Sent Events over HTTP"
config:
url: "https://api.example.com/mcp/sse"
headers:
Authorization: "Bearer ${API_TOKEN}"
http:
description: "Standard HTTP requests"
config:
url: "https://api.example.com/mcp"
headers:
X-API-Key: "${API_KEY}"
```text
### MCP Configuration
```json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["@mcp/server-filesystem"],
"env": {
"ALLOWED_PATHS": "/home/user/projects"
}
},
"database": {
"type": "sse",
"url": "http://localhost:3001/mcp",
"headers": {
"Authorization": "Bearer ${DB_TOKEN}"
}
},
"jira": {
"command": "python",
"args": ["-m", "mcp_jira"],
"env": {
"JIRA_URL": "${JIRA_URL}",
"JIRA_TOKEN": "${JIRA_TOKEN}"
}
}
}
}
```text
### Tool Naming Convention
MCP tools follow: `mcp__{server_name}__{tool_name}`
```python
allowed_tools = [
"mcp__filesystem__read_file",
"mcp__database__query",
"mcp__jira__create_issue"
]
```text
---
## Context Management
### The Context Window Problem
```text
graph TB
subgraph CW["CONTEXT WINDOW - 128k-200k tokens"]
SP["System Prompt\n~2-5k"]
TD["Tool Definitions\n~3-8k"]
PC["Project Context\n(CLAUDE.md, etc.)\n~1-5k"]
CH["Conversation History\nUser messages, Assistant responses,\nTool calls and results"]
CT["Current Turn"]
end
SP --> TD --> PC --> CH --> CT
Compaction Strategies
class ContextManager:
def __init__(self, max_tokens: int = 100000):
self.max_tokens = max_tokens
self.compaction_threshold = 0.8 # 80%
def should_compact(self, current_tokens: int) -> bool:
return current_tokens > self.max_tokens * self.compaction_threshold
def compact(self, messages: list) -> list:
"""Reduce context size while preserving critical information."""
strategies = [
self.truncate_tool_outputs, # Limit long outputs
self.summarize_old_turns, # Summarize distant history
self.remove_redundant_reads, # Remove duplicate file reads
self.compress_to_summary # Last resort: full summarization
]
for strategy in strategies:
messages = strategy(messages)
if self.count_tokens(messages) < self.max_tokens * 0.6:
break
return messages
```text
### Practical Techniques
1. **Truncate tool outputs**: Limit to 30k chars, show head + tail
2. **Deduplicate file reads**: Keep only latest version
3. **Summarize old turns**: Compress turns older than N
4. **Subagent isolation**: Subagents get fresh context, return summaries
5. **Prompt caching**: Cache static portions (system prompt, tools)
---
## CLI Interface Design
### Command Structure
```bash
# Basic usage
agent # Interactive REPL
agent "query" # Start with prompt
agent -p "query" # Non-interactive (print mode)
agent -c # Continue last session
agent -r <session-id> # Resume specific session
# Piping
cat file.log | agent -p "analyze errors"
git diff | agent -p "review changes"
# Configuration
agent --model sonnet # Model selection
agent --permission-mode strict # Permission mode
agent --max-turns 10 # Limit iterations
agent --timeout 300 # Global timeout (seconds)
# Extensions
agent --mcp-config ./mcp.json # Load MCP servers
agent --plugin-dir ./plugins # Load plugins
agent --agents '{"name": {...}}' # Define subagents
# System prompt
agent --system-prompt "You are..." # Replace
agent --append-system-prompt "Also..." # Append
agent --system-prompt-file ./prompt.txt # From file
# Output control
agent -p --output-format json # JSON output
agent -p --output-format stream-json # Streaming JSON
agent --verbose # Detailed logging
agent --debug "api,tools" # Debug categories
```text
### Interactive Commands (Slash Commands)
```text
/help Show available commands
/clear Reset conversation
/compact Compress context manually
/model <name> Switch model
/permissions Manage tool permissions
/sessions List saved sessions
/resume <id> Resume session
/save Save current session
/config View/edit configuration
/bug Report issue
/quit Exit
```text
### Custom Slash Commands
```markdown
<!-- .commands/deploy.md -->
Deploy the current project to $ARGUMENTS environment.
1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh $ARGUMENTS`
4. Verify deployment health
5. Report status
If any step fails, stop and report the error.
```text
Usage: `/project:deploy staging`
---
## Configuration Hierarchy
### Precedence (highest to lowest)
```text
1. CLI flags --model opus
2. Environment vars AGENT_MODEL=opus
3. Local config ./.agent/settings.json
4. Project config ./agent.config.json
5. User config ~/.config/agent/settings.json
6. System defaults Built-in values
```text
### Configuration File
```json
{
"model": "sonnet",
"fallback_model": "haiku",
"permission_mode": "standard",
"max_turns": 50,
"timeout_ms": 300000,
"tools": {
"allowed": ["read", "glob", "grep", "bash(git:*)"],
"disallowed": ["bash(rm -rf:*)"]
},
"context": {
"max_tokens": 100000,
"compaction_threshold": 0.8
},
"hooks": {
"PreToolUse": [...]
},
"mcp_servers": {
"filesystem": {...}
},
"output": {
"format": "text",
"verbose": false,
"color": true
}
}
```text
### Project Context File
```markdown
<!-- AGENT.md or CLAUDE.md -->
# Project: My Application
## Overview
This is a Next.js application with a Python backend.
## Architecture
- Frontend: Next.js 14, TypeScript, Tailwind
- Backend: FastAPI, PostgreSQL
- Infrastructure: Docker, Kubernetes
## Conventions
- Use TypeScript strict mode
- Follow PEP 8 for Python
- All functions require docstrings
- Tests required for new features
## Commands
- `npm run dev` - Start frontend
- `uvicorn main:app --reload` - Start backend
- `pytest` - Run tests
- `npm run lint && black .` - Format code
## File Structure
- `src/` - Frontend source
- `api/` - Backend source
- `tests/` - Test files
- `docs/` - Documentation
```text
---
## Session Management
### Session Persistence
```python
@dataclass
class Session:
id: str
created_at: datetime
updated_at: datetime
working_directory: str
messages: list[Message]
tool_states: dict # Persistent tool state
checkpoints: list[Checkpoint]
metadata: dict
class SessionStore:
def save(self, session: Session) -> None:
"""Persist session to disk."""
path = self.sessions_dir / f"{session.id}.json"
path.write_text(session.to_json())
def load(self, session_id: str) -> Session:
"""Load session from disk."""
path = self.sessions_dir / f"{session_id}.json"
return Session.from_json(path.read_text())
def list_recent(self, limit: int = 10) -> list[SessionSummary]:
"""List recent sessions."""
sessions = sorted(
self.sessions_dir.glob("*.json"),
key=lambda p: p.stat().st_mtime,
reverse=True
)
return [self._summarize(s) for s in sessions[:limit]]
```text
### Checkpointing
```python
class Checkpoint:
"""Snapshot of agent state at a point in time."""
id: str
session_id: str
timestamp: datetime
message_index: int
file_snapshots: dict[str, str] # path -> content hash
description: str
def create_checkpoint(session: Session, description: str) -> Checkpoint:
"""Create restorable checkpoint."""
return Checkpoint(
id=generate_id(),
session_id=session.id,
timestamp=datetime.now(),
message_index=len(session.messages),
file_snapshots=snapshot_working_files(session.working_directory),
description=description
)
def restore_checkpoint(checkpoint: Checkpoint) -> Session:
"""Restore session to checkpoint state."""
session = load_session(checkpoint.session_id)
session.messages = session.messages[:checkpoint.message_index]
restore_files(checkpoint.file_snapshots)
return session
```text
---
## SDK Design
### Core API
```python
from agent_sdk import Agent, AgentOptions, Message
# Streaming execution
async for message in Agent.query(
prompt="Fix the bug in auth.py",
options=AgentOptions(
model="sonnet",
allowed_tools=["read", "edit", "bash(pytest:*)"],
permission_mode="accept_edits",
system_prompt="You are a Python expert.",
working_directory="/path/to/project",
mcp_servers={"db": {...}},
hooks={"PreToolUse": [validate_hook]}
)
):
match message:
case Message(type="assistant"):
print(message.content)
case Message(type="tool_use"):
print(f"Using {message.tool_name}")
case Message(type="tool_result"):
print(f"Result: {message.result[:100]}")
case Message(type="result", subtype="success"):
print("Task completed!")
# Single-turn execution
result = await Agent.run(
prompt="List all Python files",
options=AgentOptions(allowed_tools=["glob"])
)
print(result.output)
```text
### Message Types
```python
@dataclass
class Message:
type: Literal[
"system", # System events (init, error)
"user", # User input
"assistant", # Agent response
"tool_use", # Tool invocation
"tool_result", # Tool output
"result" # Final result
]
subtype: str | None # e.g., "success", "error", "init"
content: Any
metadata: dict
```text
### Custom Tool Definition
```python
from agent_sdk import tool, ToolResult
@tool(
name="query_database",
description="Execute SQL query against the database",
parameters={
"query": {"type": "string", "description": "SQL query to execute"},
"database": {"type": "string", "description": "Database name"}
}
)
async def query_database(query: str, database: str) -> ToolResult:
try:
result = await db.execute(query, database)
return ToolResult.success(result.to_json())
except DatabaseError as e:
return ToolResult.error(f"Query failed: {e}")
```text
---
## Error Handling
### Error Categories
```python
class AgentError(Exception):
"""Base agent error."""
pass
class ToolExecutionError(AgentError):
"""Tool failed to execute."""
tool_name: str
input: dict
cause: Exception
class PermissionDeniedError(AgentError):
"""User denied permission."""
tool_name: str
input: dict
class ContextOverflowError(AgentError):
"""Context window exceeded."""
current_tokens: int
max_tokens: int
class ModelError(AgentError):
"""LLM provider error."""
provider: str
status_code: int
message: str
class TimeoutError(AgentError):
"""Operation timed out."""
operation: str
timeout_ms: int
```text
### Retry Strategies
```python
class RetryPolicy:
max_retries: int = 3
backoff_base: float = 1.0
backoff_max: float = 30.0
retryable_errors: set = {
"rate_limit",
"overloaded",
"timeout",
"connection_error"
}
def should_retry(self, error: AgentError, attempt: int) -> bool:
if attempt >= self.max_retries:
return False
return error.code in self.retryable_errors
def get_delay(self, attempt: int) -> float:
delay = self.backoff_base * (2 ** attempt)
return min(delay, self.backoff_max)
```text
---
## Observability
### Logging
```python
import structlog
logger = structlog.get_logger()
# Tool execution
logger.info("tool_execution",
tool=tool_name,
input=tool_input,
duration_ms=duration,
success=True
)
# LLM call
logger.info("llm_call",
model=model_name,
input_tokens=input_tokens,
output_tokens=output_tokens,
duration_ms=duration,
cached_tokens=cached_tokens
)
# Session events
logger.info("session_event",
event="created" | "resumed" | "completed" | "error",
session_id=session_id,
duration_total_ms=total_duration
)
```text
### Metrics
```python
from prometheus_client import Counter, Histogram
tool_calls = Counter(
"agent_tool_calls_total",
"Total tool calls",
["tool_name", "status"]
)
tool_duration = Histogram(
"agent_tool_duration_seconds",
"Tool execution duration",
["tool_name"]
)
llm_tokens = Counter(
"agent_llm_tokens_total",
"Total LLM tokens",
["model", "type"] # type: input, output, cached
)
session_duration = Histogram(
"agent_session_duration_seconds",
"Session duration"
)
```text
### Cost Tracking
```python
@dataclass
class UsageTracker:
input_tokens: int = 0
output_tokens: int = 0
cached_tokens: int = 0
tool_calls: int = 0
def add_llm_call(self, response: LLMResponse):
self.input_tokens += response.usage.input_tokens
self.output_tokens += response.usage.output_tokens
self.cached_tokens += response.usage.cached_tokens
def estimate_cost(self, pricing: dict) -> float:
input_cost = self.input_tokens * pricing["input_per_token"]
output_cost = self.output_tokens * pricing["output_per_token"]
cached_cost = self.cached_tokens * pricing["cached_per_token"]
return input_cost + output_cost + cached_cost
```text
---
## Security Considerations
### Sandboxing
```yaml
# Docker-based isolation
docker_config:
image: "agent-sandbox:latest"
network: none # or restricted
read_only_root: true
volumes:
- source: /project
target: /workspace
read_only: false
- source: /home/user/.agent
target: /config
read_only: true
resource_limits:
memory: 4g
cpu: 2
pids: 100
security_opts:
- no-new-privileges
- seccomp=restricted.json
```text
### Input Validation
```python
def validate_tool_input(tool_name: str, input: dict) -> ValidationResult:
"""Validate tool input before execution."""
schema = get_tool_schema(tool_name)
# JSON Schema validation
errors = jsonschema.validate(input, schema)
if errors:
return ValidationResult.invalid(errors)
# Path traversal check
if "path" in input or "file_path" in input:
path = input.get("path") or input.get("file_path")
if not is_within_workspace(path):
return ValidationResult.invalid("Path traversal detected")
# Command injection check
if tool_name == "bash":
if contains_injection(input["command"]):
return ValidationResult.invalid("Potential command injection")
return ValidationResult.valid()
```text
### Secret Management
```python
# Environment variable expansion (safe)
def expand_env_vars(config: dict) -> dict:
"""Expand ${VAR} patterns in config."""
def expand(value: str) -> str:
pattern = r'\$\{([^}]+)\}'
def replace(match):
var_name = match.group(1)
default = None
if ":-" in var_name:
var_name, default = var_name.split(":-", 1)
return os.environ.get(var_name, default or "")
return re.sub(pattern, replace, value)
return deep_map(config, expand)
```text
---
## Performance Optimization
### Prompt Caching
```python
class PromptCache:
"""Cache static prompt components."""
def __init__(self):
self.cache = {}
def get_cached_prefix(self, components: list[str]) -> CachedPrefix:
"""Get or create cached prefix from components."""
key = hash(tuple(components))
if key not in self.cache:
self.cache[key] = self._create_prefix(components)
return self.cache[key]
def build_messages(
self,
system_prompt: str,
tool_definitions: list,
project_context: str,
conversation: list[Message]
) -> list[Message]:
"""Build messages with cached prefix."""
# Static components get cached
prefix = self.get_cached_prefix([
system_prompt,
json.dumps(tool_definitions),
project_context
])
# Dynamic components appended
return prefix + conversation
```text
### Parallel Tool Execution
```python
async def execute_tools_batch(tool_calls: list[ToolCall]) -> list[ToolResult]:
"""Execute independent tool calls in parallel."""
# Group by dependency
independent = [t for t in tool_calls if not t.depends_on]
dependent = [t for t in tool_calls if t.depends_on]
# Execute independent calls in parallel
results = await asyncio.gather(*[
execute_tool(t) for t in independent
])
# Execute dependent calls sequentially
for tool_call in dependent:
result = await execute_tool(tool_call)
results.append(result)
return results
```text
### Streaming Output
```python
async def stream_response(query: str, options: AgentOptions):
"""Stream agent responses for real-time display."""
buffer = ""
async for message in Agent.query(query, options):
if message.type == "assistant":
# Stream text token by token
for token in message.content:
yield token
buffer += token
elif message.type == "tool_use":
yield f"\n[Using {message.tool_name}...]\n"
elif message.type == "tool_result":
if options.verbose:
yield f"[Result: {message.result[:200]}...]\n"
```text
---
## Testing Strategies
### Unit Testing Tools
```python
import pytest
from agent_sdk.testing import MockLLM, MockToolExecutor
@pytest.fixture
def mock_agent():
return Agent(
llm=MockLLM(responses=[
{"tool_use": {"name": "read", "input": {"path": "test.py"}}},
{"text": "The file contains a function definition."}
]),
tool_executor=MockToolExecutor({
"read": lambda input: "def hello(): pass"
})
)
async def test_file_analysis(mock_agent):
result = await mock_agent.run("Analyze test.py")
assert "function" in result.output
assert mock_agent.tool_calls == [("read", {"path": "test.py"})]
```text
### Integration Testing
```python
@pytest.mark.integration
async def test_edit_workflow():
"""Test complete edit workflow in isolated environment."""
with TempWorkspace() as workspace:
# Setup
workspace.write("src/main.py", "def old_name(): pass")
# Execute
agent = Agent(working_directory=workspace.path)
await agent.run("Rename old_name to new_name in src/main.py")
# Verify
content = workspace.read("src/main.py")
assert "def new_name():" in content
assert "old_name" not in content
```text
### Snapshot Testing
```python
def test_tool_schema_stability():
"""Ensure tool schemas don't change unexpectedly."""
tools = get_all_tool_definitions()
for tool in tools:
snapshot_path = f"snapshots/tools/{tool.name}.json"
current = tool.to_json()
if os.path.exists(snapshot_path):
expected = json.load(open(snapshot_path))
assert current == expected, f"Tool {tool.name} schema changed"
else:
json.dump(current, open(snapshot_path, "w"))
```text
---
## Deployment Patterns
### Docker Container
```dockerfile
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
ripgrep \
&& rm -rf /var/lib/apt/lists/*
# Install agent
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Create non-root user
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent
# Copy configuration
COPY --chown=agent:agent config/ /home/agent/.config/agent/
ENTRYPOINT ["agent"]
CMD ["--help"]
```text
### CI/CD Integration
```yaml
# GitHub Actions example
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changes
run: |
echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
- name: Run AI Review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
agent -p "Review these changed files for bugs and improvements: ${{ steps.changes.outputs.files }}" \
--output-format json \
--max-turns 20 \
> review.json
- name: Post Review Comment
uses: actions/github-script@v7
with:
script: |
const review = require('./review.json');
github.rest.pulls.createReview({
...context.repo,
pull_number: context.issue.number,
body: review.output,
event: 'COMMENT'
});
```text
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-service
spec:
replicas: 3
selector:
matchLabels:
app: agent
template:
metadata:
labels:
app: agent
spec:
containers:
- name: agent
image: my-agent:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: api-keys
key: anthropic
volumeMounts:
- name: workspace
mountPath: /workspace
volumes:
- name: workspace
emptyDir: {}
```text
---
## Appendix: Decision Checklist
### When to Use Dedicated Tools vs Bash
| Use Dedicated Tool | Use Bash |
|-------------------|----------|
| File reading (read) | Git operations |
| File editing (edit) | Package management (npm, pip) |
| Pattern search (grep) | Running tests |
| File finding (glob) | Build commands |
| Directory listing (ls) | Custom scripts |
### When to Spawn Subagents
| Spawn Subagent | Handle in Main Loop |
|----------------|---------------------|
| Complex research tasks | Simple file operations |
| Exploration of large codebases | Direct answers |
| Parallel independent tasks | Sequential dependent tasks |
| Isolated experimental changes | Standard workflows |
### Permission Mode Selection
| Mode | Use Case |
|------|----------|
| Strict | Untrusted environments, production systems |
| Standard | Normal development work |
| Permissive | Trusted projects, experienced users |
| Autonomous | Automated pipelines, sandboxed environments |
---
*This document provides architectural patterns for building agentic CLI tools. Implementations will vary based on specific requirements, target LLM providers, and use cases.*