A comprehensive guide to building production-grade agentic command-line interfaces, derived from patterns in state-of-the-art implementations.
┌─────────────────────────────────────────────────────────────┐
│ EXTENSION LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ External │ │ Hooks │ │ Skills │ │ Plugins │ │
│ │ Tools │ │ (Events) │ │ (Prompts)│ │(Packages)│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────────┤
│ DELEGATION LAYER │
│ ┌─────────────────────────────────────────┐ │
│ │ Subagents (Parallel Execution) │ │
│ │ Explorer | Planner | Specialist │ │
│ └─────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ CORE LAYER │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Main Agent Loop │ │
│ │ Context Window | Tool Executor | Message History │ │
│ └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ FOUNDATION LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ LLM │ │ Session │ │Permission│ │ Config │ │
│ │ Provider │ │ Store │ │ System │ │ Loader │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
Prefer a single main agent loop over complex multi-agent orchestration.
┌─────────────────────────────────────────────────────────┐
│ MAIN AGENT LOOP │
│ │
│ User Input → LLM → Tool Call → Result → LLM → ... │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Subagent │ (max 1 level deep) │
│ │ Execution │ │
│ └──────┬──────┘ │
│ │ │
│ Result added to │
│ main message history │
└─────────────────────────────────────────────────────────┘
Key Principles:
Design tools at multiple abstraction levels:
| Level | Characteristics | Examples |
|---|---|---|
| Low-level | Direct system access, flexible but error-prone | bash, read_file, write_file |
| Mid-level | Specialized, optimized for common operations | grep, glob, edit, multi_edit |
| High-level | Orchestration, deterministic outcomes | spawn_agent, web_fetch, todo_list |
Why multiple levels?
read:
description: "Read file contents with line numbers"
parameters:
- file_path: string (required)
- offset: integer (optional, start line)
- limit: integer (optional, max lines)
risk_level: low
write:
description: "Create or overwrite file"
parameters:
- file_path: string (required)
- content: string (required)
risk_level: high
edit:
description: "Precise search-and-replace modification"
parameters:
- file_path: string (required)
- old_text: string (required, must be unique)
- new_text: string (required)
risk_level: medium
multi_edit:
description: "Batch edits in single operation"
parameters:
- file_path: string (required)
- edits: array of {old_text, new_text}
risk_level: medium
glob:
description: "Find files by pattern"
parameters:
- pattern: string (required, e.g., "**/*.py")
- path: string (optional, search root)
risk_level: low
list_directory:
description: "List directory contents"
parameters:
- path: string (required)
- depth: integer (optional, default 1)
risk_level: low
grep:
description: "Content search using ripgrep"
parameters:
- pattern: string (required)
- path: string (optional)
- output_mode: enum [content, files_only, count]
- file_type: string (optional, e.g., "py", "js")
- context_lines: integer (optional)
risk_level: low
notes: "Always prefer dedicated grep over bash grep"
bash:
description: "Execute shell commands"
parameters:
- command: string (required)
- timeout: integer (optional, ms, max 600000)
- working_directory: string (optional)
- run_in_background: boolean (optional)
risk_level: high
restrictions:
- "Prefer dedicated tools over bash equivalents"
- "Never use: cat, head, tail, grep, find, sed, awk"
- "Quote paths with spaces"
- "Use && or ; for command chaining, not newlines"
web_search:
description: "Search the internet"
parameters:
- query: string (required)
- max_results: integer (optional)
risk_level: low
web_fetch:
description: "Retrieve web page content"
parameters:
- url: string (required)
- extract_text: boolean (optional)
risk_level: low
spawn_agent:
description: "Delegate task to subagent"
parameters:
- task: string (required)
- agent_type: enum [explorer, planner, general]
- tools: array of strings (optional, tool subset)
risk_level: low
todo_list:
description: "Track task progress"
parameters:
- action: enum [read, write, update]
- todos: array of {content, status}
risk_level: low
modes:
strict:
description: "Confirm every action"
auto_approve: []
standard:
description: "Approve reads, confirm writes"
auto_approve: [read, glob, grep, list_directory, web_search]
permissive:
description: "Auto-approve safe operations"
auto_approve: [read, glob, grep, list_directory, edit, web_*]
autonomous:
description: "No confirmations (dangerous)"
auto_approve: ["*"]
requires_flag: "--dangerously-skip-permissions"
{
"allowedTools": [
"read",
"glob",
"grep",
"bash(git:*)",
"bash(npm:*)",
"bash(pytest:*)"
],
"disallowedTools": [
"bash(rm -rf:*)",
"bash(sudo:*)"
]
}
Pattern syntax:
tool_name - exact matchtool_name(prefix:*) - match commands starting with prefix* - match all toolsexplorer:
model: fast (e.g., Haiku, GPT-4o-mini)
mode: read-only
tools: [glob, grep, read, list_directory]
parameters:
thoroughness: [quick, medium, thorough]
use_cases:
- Codebase exploration
- File discovery
- Pattern searching
planner:
model: capable (e.g., Sonnet, GPT-4o)
mode: read-only
tools: [read, glob, grep, bash(safe)]
use_cases:
- Implementation planning
- Architecture decisions
- Breaking down complex tasks
general:
model: capable
mode: full
tools: all
use_cases:
- Complex research + modification
- Multi-step workflows
- Delegated implementations
# .agents/security-reviewer.yaml
name: security-reviewer
description: "Security-focused code reviewer"
model: opus # or specific model string
permission_mode: plan # read-only
tools:
- read
- grep
- glob
- bash(git log:*)
- bash(git diff:*)
prompt: |
You are a senior security engineer reviewing code.
Focus on:
- OWASP Top 10 vulnerabilities
- Secrets and hardcoded credentials
- SQL injection, XSS, CSRF
- Authentication/authorization flaws
Report findings with severity levels and remediation steps.
# Programmatic
result = await spawn_agent(
task="Find all authentication-related files",
agent_type="explorer",
thoroughness="thorough"
)
# Natural language (agent decides)
"Use the explorer agent to find all API endpoints"
"Have a subagent analyze the database schema"
lifecycle:
- SessionStart # Agent session begins
- SessionEnd # Agent session ends
- Stop # Agent completes task
tool_events:
- PreToolUse # Before tool execution
- PostToolUse # After successful execution
- PostToolUseFailure # After failed execution
- PermissionRequest # User permission needed
user_events:
- UserPromptSubmit # Before processing user input
- Notification # System notifications
{
"hooks": {
"PreToolUse": [
{
"matcher": "bash",
"hooks": [
{
"type": "command",
"command": "/path/to/validate-bash.sh"
}
]
},
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "/path/to/log-tool-use.sh"
}
]
}
],
"PostToolUse": [
{
"matcher": "edit",
"hooks": [
{
"type": "command",
"command": "npm run lint --fix"
}
]
}
]
}
}
# Hook receives JSON on stdin
{
"tool_name": "bash",
"tool_input": {
"command": "rm -rf /tmp/cache"
},
"session_id": "abc123",
"conversation_id": "xyz789"
}
# Hook returns JSON on stdout
{
"decision": "deny", # or "approve", "passthrough"
"reason": "Dangerous command blocked",
"modified_input": null, # optional: modify tool input
"system_message": null # optional: inject context
}
# Security validation
async def validate_bash(input_data):
command = input_data["tool_input"].get("command", "")
dangerous = ["rm -rf /", "sudo", "> /dev/sda"]
if any(d in command for d in dangerous):
return {"decision": "deny", "reason": "Dangerous command"}
return {"decision": "passthrough"}
# Audit logging
async def log_all_tools(input_data):
log.info(f"Tool: {input_data['tool_name']}, Input: {input_data['tool_input']}")
return {}
# Auto-formatting after edits
async def post_edit_format(input_data):
if input_data["tool_name"] == "edit":
file_path = input_data["tool_input"]["file_path"]
if file_path.endswith(".py"):
subprocess.run(["black", file_path])
return {}
Skills are prompt injection modules that extend agent capabilities through specialized instructions, not code execution.
┌─────────────────────────────────────────────────────────┐
│ SKILL ACTIVATION │
│ │
│ User Request → Skill Matcher → Inject SKILL.md → │
│ │
│ → Agent now has specialized instructions/context │
└─────────────────────────────────────────────────────────┘
<!-- .skills/code-review/SKILL.md -->
---
name: code-review
description: "Thorough code review with best practices"
tools: Read,Grep,Glob
---
# Code Review Skill
When performing code reviews, follow this methodology:
## 1. Understanding Phase
- Read the changed files completely
- Identify the purpose of changes
- Check for related test files
## 2. Analysis Checklist
- [ ] Error handling present
- [ ] Edge cases considered
- [ ] No hardcoded secrets
- [ ] Logging appropriate
- [ ] Tests cover new code
## 3. Output Format
Provide feedback as:
- 🔴 Critical: Must fix before merge
- 🟡 Suggestion: Consider improving
- 🟢 Praise: Well done
project/
├── .skills/
│ ├── code-review/
│ │ └── SKILL.md
│ └── api-design/
│ └── SKILL.md
└── ~/.config/agent/skills/ # Global skills
└── security-audit/
└── SKILL.md
my-plugin/
├── plugin.json # Manifest (required)
├── commands/ # Slash commands
│ └── deploy.md
├── agents/ # Custom agents
│ └── devops.md
├── skills/ # Skills
│ └── kubernetes/
│ └── SKILL.md
├── hooks.json # Hook definitions
└── mcp.json # External tool servers
{
"name": "devops-toolkit",
"version": "1.0.0",
"description": "DevOps automation tools",
"author": "Your Name",
"components": {
"commands": ["commands/*.md"],
"agents": ["agents/*.md"],
"skills": ["skills/*/SKILL.md"],
"hooks": "hooks.json",
"mcp": "mcp.json"
}
}
# Programmatic
agent = Agent(
plugins=[
{"type": "local", "path": "./my-plugin"},
{"type": "local", "path": "~/.agent/plugins/shared"}
]
)
# CLI
agent --plugin-dir ./my-plugin --plugin-dir ~/.agent/plugins/shared
Model Context Protocol enables connecting external services as tools.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent │ ←──→ │ MCP Client │ ←──→ │ MCP Server │
│ Core │ │ │ │ (External) │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Database │ │ API │ │ Browser │
└──────────┘ └──────────┘ └──────────┘
stdio:
description: "Spawn process, communicate via stdin/stdout"
config:
command: "npx"
args: ["@mcp/server-filesystem"]
env:
ALLOWED_PATHS: "/home/user/projects"
sse:
description: "Server-Sent Events over HTTP"
config:
url: "https://api.example.com/mcp/sse"
headers:
Authorization: "Bearer ${API_TOKEN}"
http:
description: "Standard HTTP requests"
config:
url: "https://api.example.com/mcp"
headers:
X-API-Key: "${API_KEY}"
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["@mcp/server-filesystem"],
"env": {
"ALLOWED_PATHS": "/home/user/projects"
}
},
"database": {
"type": "sse",
"url": "http://localhost:3001/mcp",
"headers": {
"Authorization": "Bearer ${DB_TOKEN}"
}
},
"jira": {
"command": "python",
"args": ["-m", "mcp_jira"],
"env": {
"JIRA_URL": "${JIRA_URL}",
"JIRA_TOKEN": "${JIRA_TOKEN}"
}
}
}
}
MCP tools follow: mcp__{server_name}__{tool_name}
allowed_tools = [
"mcp__filesystem__read_file",
"mcp__database__query",
"mcp__jira__create_issue"
]
┌─────────────────────────────────────────────────────────┐
│ CONTEXT WINDOW │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ System Prompt ~2-5k │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Tool Definitions ~3-8k │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Project Context (CLAUDE.md, etc.) ~1-5k │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Conversation History VARIABLE │ │
│ │ - User messages │ │
│ │ - Assistant responses │ │
│ │ - Tool calls and results │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Current Turn VARIABLE │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Total limit: 128k-200k tokens (model dependent) │
└─────────────────────────────────────────────────────────┘
class ContextManager:
def __init__(self, max_tokens: int = 100000):
self.max_tokens = max_tokens
self.compaction_threshold = 0.8 # 80%
def should_compact(self, current_tokens: int) -> bool:
return current_tokens > self.max_tokens * self.compaction_threshold
def compact(self, messages: list) -> list:
"""Reduce context size while preserving critical information."""
strategies = [
self.truncate_tool_outputs, # Limit long outputs
self.summarize_old_turns, # Summarize distant history
self.remove_redundant_reads, # Remove duplicate file reads
self.compress_to_summary # Last resort: full summarization
]
for strategy in strategies:
messages = strategy(messages)
if self.count_tokens(messages) < self.max_tokens * 0.6:
break
return messages
# Basic usage
agent # Interactive REPL
agent "query" # Start with prompt
agent -p "query" # Non-interactive (print mode)
agent -c # Continue last session
agent -r <session-id> # Resume specific session
# Piping
cat file.log | agent -p "analyze errors"
git diff | agent -p "review changes"
# Configuration
agent --model sonnet # Model selection
agent --permission-mode strict # Permission mode
agent --max-turns 10 # Limit iterations
agent --timeout 300 # Global timeout (seconds)
# Extensions
agent --mcp-config ./mcp.json # Load MCP servers
agent --plugin-dir ./plugins # Load plugins
agent --agents '{"name": {...}}' # Define subagents
# System prompt
agent --system-prompt "You are..." # Replace
agent --append-system-prompt "Also..." # Append
agent --system-prompt-file ./prompt.txt # From file
# Output control
agent -p --output-format json # JSON output
agent -p --output-format stream-json # Streaming JSON
agent --verbose # Detailed logging
agent --debug "api,tools" # Debug categories
/help Show available commands
/clear Reset conversation
/compact Compress context manually
/model <name> Switch model
/permissions Manage tool permissions
/sessions List saved sessions
/resume <id> Resume session
/save Save current session
/config View/edit configuration
/bug Report issue
/quit Exit
<!-- .commands/deploy.md -->
Deploy the current project to $ARGUMENTS environment.
1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh $ARGUMENTS`
4. Verify deployment health
5. Report status
If any step fails, stop and report the error.
Usage: /project:deploy staging
1. CLI flags --model opus
2. Environment vars AGENT_MODEL=opus
3. Local config ./.agent/settings.json
4. Project config ./agent.config.json
5. User config ~/.config/agent/settings.json
6. System defaults Built-in values
{
"model": "sonnet",
"fallback_model": "haiku",
"permission_mode": "standard",
"max_turns": 50,
"timeout_ms": 300000,
"tools": {
"allowed": ["read", "glob", "grep", "bash(git:*)"],
"disallowed": ["bash(rm -rf:*)"]
},
"context": {
"max_tokens": 100000,
"compaction_threshold": 0.8
},
"hooks": {
"PreToolUse": [...]
},
"mcp_servers": {
"filesystem": {...}
},
"output": {
"format": "text",
"verbose": false,
"color": true
}
}
<!-- AGENT.md or CLAUDE.md -->
# Project: My Application
## Overview
This is a Next.js application with a Python backend.
## Architecture
- Frontend: Next.js 14, TypeScript, Tailwind
- Backend: FastAPI, PostgreSQL
- Infrastructure: Docker, Kubernetes
## Conventions
- Use TypeScript strict mode
- Follow PEP 8 for Python
- All functions require docstrings
- Tests required for new features
## Commands
- `npm run dev` - Start frontend
- `uvicorn main:app --reload` - Start backend
- `pytest` - Run tests
- `npm run lint && black .` - Format code
## File Structure
- `src/` - Frontend source
- `api/` - Backend source
- `tests/` - Test files
- `docs/` - Documentation
@dataclass
class Session:
id: str
created_at: datetime
updated_at: datetime
working_directory: str
messages: list[Message]
tool_states: dict # Persistent tool state
checkpoints: list[Checkpoint]
metadata: dict
class SessionStore:
def save(self, session: Session) -> None:
"""Persist session to disk."""
path = self.sessions_dir / f"{session.id}.json"
path.write_text(session.to_json())
def load(self, session_id: str) -> Session:
"""Load session from disk."""
path = self.sessions_dir / f"{session_id}.json"
return Session.from_json(path.read_text())
def list_recent(self, limit: int = 10) -> list[SessionSummary]:
"""List recent sessions."""
sessions = sorted(
self.sessions_dir.glob("*.json"),
key=lambda p: p.stat().st_mtime,
reverse=True
)
return [self._summarize(s) for s in sessions[:limit]]
class Checkpoint:
"""Snapshot of agent state at a point in time."""
id: str
session_id: str
timestamp: datetime
message_index: int
file_snapshots: dict[str, str] # path -> content hash
description: str
def create_checkpoint(session: Session, description: str) -> Checkpoint:
"""Create restorable checkpoint."""
return Checkpoint(
id=generate_id(),
session_id=session.id,
timestamp=datetime.now(),
message_index=len(session.messages),
file_snapshots=snapshot_working_files(session.working_directory),
description=description
)
def restore_checkpoint(checkpoint: Checkpoint) -> Session:
"""Restore session to checkpoint state."""
session = load_session(checkpoint.session_id)
session.messages = session.messages[:checkpoint.message_index]
restore_files(checkpoint.file_snapshots)
return session
from agent_sdk import Agent, AgentOptions, Message
# Streaming execution
async for message in Agent.query(
prompt="Fix the bug in auth.py",
options=AgentOptions(
model="sonnet",
allowed_tools=["read", "edit", "bash(pytest:*)"],
permission_mode="accept_edits",
system_prompt="You are a Python expert.",
working_directory="/path/to/project",
mcp_servers={"db": {...}},
hooks={"PreToolUse": [validate_hook]}
)
):
match message:
case Message(type="assistant"):
print(message.content)
case Message(type="tool_use"):
print(f"Using {message.tool_name}")
case Message(type="tool_result"):
print(f"Result: {message.result[:100]}")
case Message(type="result", subtype="success"):
print("Task completed!")
# Single-turn execution
result = await Agent.run(
prompt="List all Python files",
options=AgentOptions(allowed_tools=["glob"])
)
print(result.output)
@dataclass
class Message:
type: Literal[
"system", # System events (init, error)
"user", # User input
"assistant", # Agent response
"tool_use", # Tool invocation
"tool_result", # Tool output
"result" # Final result
]
subtype: str | None # e.g., "success", "error", "init"
content: Any
metadata: dict
from agent_sdk import tool, ToolResult
@tool(
name="query_database",
description="Execute SQL query against the database",
parameters={
"query": {"type": "string", "description": "SQL query to execute"},
"database": {"type": "string", "description": "Database name"}
}
)
async def query_database(query: str, database: str) -> ToolResult:
try:
result = await db.execute(query, database)
return ToolResult.success(result.to_json())
except DatabaseError as e:
return ToolResult.error(f"Query failed: {e}")
class AgentError(Exception):
"""Base agent error."""
pass
class ToolExecutionError(AgentError):
"""Tool failed to execute."""
tool_name: str
input: dict
cause: Exception
class PermissionDeniedError(AgentError):
"""User denied permission."""
tool_name: str
input: dict
class ContextOverflowError(AgentError):
"""Context window exceeded."""
current_tokens: int
max_tokens: int
class ModelError(AgentError):
"""LLM provider error."""
provider: str
status_code: int
message: str
class TimeoutError(AgentError):
"""Operation timed out."""
operation: str
timeout_ms: int
class RetryPolicy:
max_retries: int = 3
backoff_base: float = 1.0
backoff_max: float = 30.0
retryable_errors: set = {
"rate_limit",
"overloaded",
"timeout",
"connection_error"
}
def should_retry(self, error: AgentError, attempt: int) -> bool:
if attempt >= self.max_retries:
return False
return error.code in self.retryable_errors
def get_delay(self, attempt: int) -> float:
delay = self.backoff_base * (2 ** attempt)
return min(delay, self.backoff_max)
import structlog
logger = structlog.get_logger()
# Tool execution
logger.info("tool_execution",
tool=tool_name,
input=tool_input,
duration_ms=duration,
success=True
)
# LLM call
logger.info("llm_call",
model=model_name,
input_tokens=input_tokens,
output_tokens=output_tokens,
duration_ms=duration,
cached_tokens=cached_tokens
)
# Session events
logger.info("session_event",
event="created" | "resumed" | "completed" | "error",
session_id=session_id,
duration_total_ms=total_duration
)
from prometheus_client import Counter, Histogram
tool_calls = Counter(
"agent_tool_calls_total",
"Total tool calls",
["tool_name", "status"]
)
tool_duration = Histogram(
"agent_tool_duration_seconds",
"Tool execution duration",
["tool_name"]
)
llm_tokens = Counter(
"agent_llm_tokens_total",
"Total LLM tokens",
["model", "type"] # type: input, output, cached
)
session_duration = Histogram(
"agent_session_duration_seconds",
"Session duration"
)
@dataclass
class UsageTracker:
input_tokens: int = 0
output_tokens: int = 0
cached_tokens: int = 0
tool_calls: int = 0
def add_llm_call(self, response: LLMResponse):
self.input_tokens += response.usage.input_tokens
self.output_tokens += response.usage.output_tokens
self.cached_tokens += response.usage.cached_tokens
def estimate_cost(self, pricing: dict) -> float:
input_cost = self.input_tokens * pricing["input_per_token"]
output_cost = self.output_tokens * pricing["output_per_token"]
cached_cost = self.cached_tokens * pricing["cached_per_token"]
return input_cost + output_cost + cached_cost
# Docker-based isolation
docker_config:
image: "agent-sandbox:latest"
network: none # or restricted
read_only_root: true
volumes:
- source: /project
target: /workspace
read_only: false
- source: /home/user/.agent
target: /config
read_only: true
resource_limits:
memory: 4g
cpu: 2
pids: 100
security_opts:
- no-new-privileges
- seccomp=restricted.json
def validate_tool_input(tool_name: str, input: dict) -> ValidationResult:
"""Validate tool input before execution."""
schema = get_tool_schema(tool_name)
# JSON Schema validation
errors = jsonschema.validate(input, schema)
if errors:
return ValidationResult.invalid(errors)
# Path traversal check
if "path" in input or "file_path" in input:
path = input.get("path") or input.get("file_path")
if not is_within_workspace(path):
return ValidationResult.invalid("Path traversal detected")
# Command injection check
if tool_name == "bash":
if contains_injection(input["command"]):
return ValidationResult.invalid("Potential command injection")
return ValidationResult.valid()
# Environment variable expansion (safe)
def expand_env_vars(config: dict) -> dict:
"""Expand ${VAR} patterns in config."""
def expand(value: str) -> str:
pattern = r'\$\{([^}]+)\}'
def replace(match):
var_name = match.group(1)
default = None
if ":-" in var_name:
var_name, default = var_name.split(":-", 1)
return os.environ.get(var_name, default or "")
return re.sub(pattern, replace, value)
return deep_map(config, expand)
class PromptCache:
"""Cache static prompt components."""
def __init__(self):
self.cache = {}
def get_cached_prefix(self, components: list[str]) -> CachedPrefix:
"""Get or create cached prefix from components."""
key = hash(tuple(components))
if key not in self.cache:
self.cache[key] = self._create_prefix(components)
return self.cache[key]
def build_messages(
self,
system_prompt: str,
tool_definitions: list,
project_context: str,
conversation: list[Message]
) -> list[Message]:
"""Build messages with cached prefix."""
# Static components get cached
prefix = self.get_cached_prefix([
system_prompt,
json.dumps(tool_definitions),
project_context
])
# Dynamic components appended
return prefix + conversation
async def execute_tools_batch(tool_calls: list[ToolCall]) -> list[ToolResult]:
"""Execute independent tool calls in parallel."""
# Group by dependency
independent = [t for t in tool_calls if not t.depends_on]
dependent = [t for t in tool_calls if t.depends_on]
# Execute independent calls in parallel
results = await asyncio.gather(*[
execute_tool(t) for t in independent
])
# Execute dependent calls sequentially
for tool_call in dependent:
result = await execute_tool(tool_call)
results.append(result)
return results
async def stream_response(query: str, options: AgentOptions):
"""Stream agent responses for real-time display."""
buffer = ""
async for message in Agent.query(query, options):
if message.type == "assistant":
# Stream text token by token
for token in message.content:
yield token
buffer += token
elif message.type == "tool_use":
yield f"\n[Using {message.tool_name}...]\n"
elif message.type == "tool_result":
if options.verbose:
yield f"[Result: {message.result[:200]}...]\n"
import pytest
from agent_sdk.testing import MockLLM, MockToolExecutor
@pytest.fixture
def mock_agent():
return Agent(
llm=MockLLM(responses=[
{"tool_use": {"name": "read", "input": {"path": "test.py"}}},
{"text": "The file contains a function definition."}
]),
tool_executor=MockToolExecutor({
"read": lambda input: "def hello(): pass"
})
)
async def test_file_analysis(mock_agent):
result = await mock_agent.run("Analyze test.py")
assert "function" in result.output
assert mock_agent.tool_calls == [("read", {"path": "test.py"})]
@pytest.mark.integration
async def test_edit_workflow():
"""Test complete edit workflow in isolated environment."""
with TempWorkspace() as workspace:
# Setup
workspace.write("src/main.py", "def old_name(): pass")
# Execute
agent = Agent(working_directory=workspace.path)
await agent.run("Rename old_name to new_name in src/main.py")
# Verify
content = workspace.read("src/main.py")
assert "def new_name():" in content
assert "old_name" not in content
def test_tool_schema_stability():
"""Ensure tool schemas don't change unexpectedly."""
tools = get_all_tool_definitions()
for tool in tools:
snapshot_path = f"snapshots/tools/{tool.name}.json"
current = tool.to_json()
if os.path.exists(snapshot_path):
expected = json.load(open(snapshot_path))
assert current == expected, f"Tool {tool.name} schema changed"
else:
json.dump(current, open(snapshot_path, "w"))
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
ripgrep \
&& rm -rf /var/lib/apt/lists/*
# Install agent
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Create non-root user
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent
# Copy configuration
COPY --chown=agent:agent config/ /home/agent/.config/agent/
ENTRYPOINT ["agent"]
CMD ["--help"]
# GitHub Actions example
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changes
run: |
echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
- name: Run AI Review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
agent -p "Review these changed files for bugs and improvements: ${{ steps.changes.outputs.files }}" \
--output-format json \
--max-turns 20 \
> review.json
- name: Post Review Comment
uses: actions/github-script@v7
with:
script: |
const review = require('./review.json');
github.rest.pulls.createReview({
...context.repo,
pull_number: context.issue.number,
body: review.output,
event: 'COMMENT'
});
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-service
spec:
replicas: 3
selector:
matchLabels:
app: agent
template:
metadata:
labels:
app: agent
spec:
containers:
- name: agent
image: my-agent:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: api-keys
key: anthropic
volumeMounts:
- name: workspace
mountPath: /workspace
volumes:
- name: workspace
emptyDir: {}
| Use Dedicated Tool | Use Bash |
|---|---|
| File reading (read) | Git operations |
| File editing (edit) | Package management (npm, pip) |
| Pattern search (grep) | Running tests |
| File finding (glob) | Build commands |
| Directory listing (ls) | Custom scripts |
| Spawn Subagent | Handle in Main Loop |
|---|---|
| Complex research tasks | Simple file operations |
| Exploration of large codebases | Direct answers |
| Parallel independent tasks | Sequential dependent tasks |
| Isolated experimental changes | Standard workflows |
| Mode | Use Case |
|---|---|
| Strict | Untrusted environments, production systems |
| Standard | Normal development work |
| Permissive | Trusted projects, experienced users |
| Autonomous | Automated pipelines, sandboxed environments |
This document provides architectural patterns for building agentic CLI tools. Implementations will vary based on specific requirements, target LLM providers, and use cases.