Agent Skill Management: Tools, Skills, and Capabilities in the 2026 AI Ecosystem
AIAn agent without tools is just a chat model with extra steps. The ability to call APIs, execute code, query databases, and manipulate files is what transforms a language model from a static oracle into an autonomous worker. But as the agent ecosystem exploded in 2025-2026, so did the number of ways to define, register, discover, and govern those capabilities.
This guide maps the 2026 landscape of agent skill management — how different frameworks approach tools and skills, what protocols are standardizing, and what patterns production teams are adopting.
Part 1: The Core Abstraction — What Is a Skill?
Before comparing frameworks, it helps to define what we're actually talking about. Every agentic system needs a way to give the model access to external capabilities. The abstraction varies by name — tools, skills, functions, capabilities — but the core contract is the same:
- A name: A unique identifier the model uses to request the capability
- A description: Plain-text documentation of what the capability does (this is the most important factor for model performance)
- An input schema: Structured parameters the model must provide (typically JSON Schema)
- An execution handler: The actual code that runs when the tool is invoked
- An output: Structured data returned to the model
name + description + input_schema → model proposes call → handler executes → output returned → model continues
The nuance lies in how these five elements are defined, when they are discovered, and who controls which tools are available in which context.
Tools vs Skills
The ecosystem has settled on a loose distinction:
| Concept | Scope | Example |
|---|---|---|
| Tool | A single, atomic function | get_weather(city), search_docs(query) |
| Skill | A bundled capability with context, instructions, and possibly multiple tools | "Code review skill" that includes git diff, run SAST, analyze results |
A skill is a higher-level abstraction — it doesn't just define what to call, but when, why, and how to use the tools together. This becomes critical when agents need to handle complex, multi-step tasks.
Part 2: MCP — The Protocol Layer for Tools
The Model Context Protocol (MCP), now at spec version 2026-07-28, has become the closest thing to a universal standard for tool connectivity. It defines three primitives that servers expose to clients:
┌─────────────┐ tools/list, resources/list, prompts/list ┌──────────────┐
│ Client │ ◄──────────────────────────────────────────────► │ MCP Server │
│ (Agent) │ tools/call, resources/read, prompts/get │ (Tools/Data) │
└─────────────┘ └──────────────┘
The Three Primitives
| Primitive | Purpose | Lifecycle |
|---|---|---|
| Tools | Executable functions the model invokes | Stateless: call → result |
| Resources | Read-only contextual data (schemas, files, configs) | Stateful URI scheme |
| Prompts | Reusable interaction templates | Stateless: get → template |
Dynamic Discovery
MCP's key innovation is dynamic discovery at connection time. A client never needs to pre-know a server's tools:
// Client connects and queries capabilities
interface ServerCapabilities {
tools?: ListToolsResult | "dynamic";
resources?: ListResourcesResult | "dynamic";
prompts?: ListPromptsResult | "dynamic";
}
The server responds with a structured capability catalog that becomes part of the model's context:
{
"tools": [
{
"name": "query_database",
"description": "Execute read-only SQL queries",
"inputSchema": {
"type": "object",
"properties": {
"sql": { "type": "string", "description": "SELECT statement" }
},
"required": ["sql"]
}
}
]
}
Discovery Extensions
Beyond the handshake, MCP servers can expose capabilities via several mechanisms:
.well-known/mcp/server-card.json(SEP-1649) — Pre-connection capability probing for HTTP servers.well-known/mcp(SEP-1960) — Manifest endpoint with endpoints, capabilities, and auth requirements- DNS TXT
_mcp.{host}(IETF draft) — Fast-mode discovery for crawlers - MCP Registry — Community registry for public server discovery
Server Implementation
The Python FastMCP library offers the cleanest developer experience:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("analytics-server")
@mcp.tool()
def query_analytics(dataset: str, metric: str) -> str:
"""Query analytics data.
Args:
dataset: Dataset name (users, revenue, sessions)
metric: Metric to retrieve
"""
return f"{metric} for {dataset}: 42,000"
@mcp.resource("schema://{dataset}")
def dataset_schema(dataset: str) -> str:
"""Return schema for a dataset."""
return f"Schema for {dataset}: timestamp, value, segment"
MCP is transport-agnostic: stdio for local processes, Streamable HTTP for remote servers (now recommended over SSE), and WebSocket for persistent connections.
Part 3: Framework-Specific Patterns
Each major framework wraps the tool abstraction with its own ergonomics, lifecycle, and extension points.
OpenAI Agents SDK
The SDK uses a decorator-based approach with three tool tiers:
from agents import Agent, function_tool
# Tier 1: Simple function tool
@function_tool
def get_weather(city: str) -> str:
"""Return weather for a city."""
return f"The weather in {city} is sunny."
# Tier 2: Tool with governance
@function_tool(needs_approval=True, timeoutMs=5000)
def send_email(to: str, subject: str, body: str) -> str:
"""Send an email. Requires human approval."""
return email_client.send(to, subject, body)
# Tier 3: Deferred loading for large tool ecosystems
@function_tool(defer_loading=True, namespace="database")
def query_database(sql: str) -> str:
"""Run read-only SQL query. Deferred: only loaded when model opts in."""
return db.execute(sql)
agent = Agent(
name="DataAgent",
instructions="You retrieve data.",
tools=[get_weather, send_email, ToolSearchTool()],
tool_use_behavior="run_llm_again"
)
Key features in the SDK:
needs_approval— Boolean or callable for human-in-the-loop gatesdefer_loading+ToolSearchTool()— Only loads tools when the model needs them (critical for 100+ tool ecosystems)timeoutMs/timeoutBehavior— Per-call timeout handling ("error_as_result"or"raise_exception")- Guardrails —
input_guardrails/output_guardrailsas validation middleware - Agents as tools — Expose an entire agent as a callable tool without full handoff
LangChain & Deep Agents
LangChain's approach is middleware-centric. Tools are defined with the @tool decorator and composed via a middleware stack:
from langchain.tools import tool
from langchain.agents import create_agent
@tool
def search_docs(query: str, max_results: int = 5) -> str:
"""Search internal documentation. Args: query — search terms."""
return doc_search(query, max_results)
agent = create_agent(
model="anthropic:claude-sonnet-4-20250514",
tools=[search_docs],
middleware=[
FilesystemMiddleware(),
SummarizationMiddleware(),
MemoryMiddleware(sources=["./AGENTS.md"]),
SkillsMiddleware(sources=["./skills/"]),
HumanInTheLoopMiddleware(interrupt_on=["send_email", "delete"]),
]
)
Skills in Deep Agents use progressive disclosure — only YAML frontmatter loads at startup; the full SKILL.md loads only when invoked:
---
name: code-review
description: Perform security-focused code review for pull requests
tags: [security, code-review, devops]
version: "2.1.0"
---
# Code Review Skill
## When to Use
Activate when asked to review code for security vulnerabilities...
## Workflow
1. Fetch the diff using `git diff`
2. Run SAST scanner
3. Flag critical issues
Three-level loading:
- Discovery: Parse SKILL.md frontmatter at startup → inject into system prompt
- Read: Agent invokes skill → read full SKILL.md via
read_file - Execute: Agent follows instructions, reads supporting files only when needed
This saves significant context. SkillsHub.wtf reports 250x token savings compared to reading full SKILL.md files for all available skills.
CrewAI
CrewAI aggregates five capability types into a unified tool list:
from crewai import Agent, Task, Crew
from crewai.tools import SerperDevTool
agent = Agent(
role="Research Analyst",
goal="Analyze market trends",
backstory="Senior analyst with 10 years experience",
tools=[SerperDevTool()], # Callable tools
allow_delegation=True, # Can delegate to other agents
function_calling_llm=None # Optional: separate LLM for tool calling
)
# Tasks can override the agent's default tools
task = Task(
description="Research AI agent frameworks",
tools=[SerperDevTool(), WebScraperTool()], # Task-level override
agent=agent
)
| Capability | What It Provides | Resolution |
|---|---|---|
| Tools | Callable functions (web search, file ops) | BaseTool instance |
| MCPs | Remote tool servers via stdio/SSE/HTTP | BaseTool instance |
| Apps | Platform integrations (Gmail, Sheets) | BaseTool instance |
| Skills | Domain expertise in SKILL.md format | BaseTool instance |
| Knowledge | Retrieved facts from PDFs, CSVs | RAG-retrieved context |
All five resolve to internal BaseTool instances in a unified tool list — the agent doesn't distinguish between a local function, a remote MCP server, or a bundled skill.
AutoGen / AG2
AG2 separates the agent that proposes a tool call from the agent that executes it:
from autogen import ConversableAgent, UserProxyAgent, register_function
def get_weather(city: str) -> str:
return f"Weather in {city}: sunny"
date_agent = ConversableAgent(name="planner", llm_config=llm_config)
executor = UserProxyAgent(name="executor", human_input_mode="NEVER")
register_function(
get_weather,
caller=date_agent, # Proposes the call
executor=executor, # Executes the call
name="get_weather",
description="Get weather for a given city"
)
Or with the decorator API:
@date_agent.register_for_llm(description="Get the day of the week")
@executor.register_for_execution()
def get_weekday(date: str) -> str:
from datetime import datetime
return datetime.strptime(date, '%Y-%m-%d').strftime('%A')
AG2 supports static registration (at agent init), dynamic registration (at runtime via tools=[]), and hybrid — core tools are static, situational tools are injected at runtime.
Anthropic / Claude SDK
Claude's Tool Runner abstraction handles the entire agentic loop:
from anthropic import Anthropic
from anthropic._models import BetaRunnableTool
client = Anthropic()
tools = [
BetaRunnableTool(
name="get_weather",
description="Get weather for a city. Provide city name.",
input_schema={
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
},
run=lambda params, ctx: f"Weather in {params['city']}: sunny"
)
]
with client.beta.messages.tool_runner(tools=tools) as runner:
for message in runner.stream(text="What's the weather in Paris?"):
print(message)
Claude also ships trained tool signatures — tool definitions that the model is specifically optimized to call correctly:
bash_code_execution— Sandboxed shell commandstext_editor_code_execution— View/create/edit filesweb_search_20260209— Web searchweb_fetch_20260209— Fetch URLs
Part 4: ACP — Agent-to-Agent Skill Advertising
While MCP connects models to tools, the Agent Communication Protocol (ACP) connects agents to other agents. Agents publish Agent Cards — structured JSON documents describing their capabilities:
{
"name": "code-review-agent",
"description": "Performs security-focused code review",
"url": "https://api.company.com/acp/code-review",
"capabilities": [
{ "name": "review_pr", "description": "Review a PR for security issues" }
],
"skills": [
{ "name": "sast", "input_modes": ["text"], "output_modes": ["json"] }
],
"limitations": [
{ "kind": "modality", "type": "audio", "description": "Does not process audio" }
]
}
Discovery Methods
| Method | Description |
|---|---|
| Basic | Query running ACP servers directly |
| Open | Public manifest at /.well-known/acp.json |
| Registry | Centralized registry (online or offline cache) |
| Embedded | Metadata in container images for air-gapped deployments |
| LAN | TCP probe scan of /24 subnet in 1-3 seconds (v2.1-alpha) |
ACP's skills API includes GET /skills for listing, POST /skills/query for filtered search, and PATCH /skills/{id}/limitations for runtime capability updates.
The relationship between the protocols is complementary:
┌─────────────────┐
│ Orchestrator │
│ Agent │
└────┬─────────┬──┘
│ │
MCP │ │ ACP
│ │
┌────────▼──┐ ┌──▼─────────┐
│ Tools │ │ Sub-agent │
│ (API/DB) │ │ (Specialist)│
└───────────┘ └────────────┘
MCP for connecting to external systems, ACP for connecting to other agents.
Part 5: Skill Versioning & Lifecycle
As skills become production artifacts, versioning and lifecycle management have emerged as critical concerns.
Semantic Versioning for Skills
The adapted SemVer for AI skills maps breaking changes to clearly defined patterns:
MAJOR — Breaking changes (schema rename, removed params, output shape change)
MINOR — Backward-compatible additions (new optional params, new tools)
PATCH — Bug fixes (typos, docstring corrections, schema validation fixes)
Deprecation Lifecycle
A recommended three-phase deprecation policy:
Phase 1: Deprecation Warning
→ Mark skill with @deprecated in description + provide replacement name
Phase 2: Soft Error
→ Tool still works but appends warning to output:
{"result": "...", "warning": "Will be removed 2026-08-01. Migrate to v2."}
Phase 3: Hard Error
→ Remove from routing registry, return descriptive error with migration path
Runtime Manifest
Proposed standard for runtime version metadata (geodocs.dev):
{
"version": "2.0.0",
"version_scheme": "semver",
"lifecycle_state": "deprecated",
"deprecated_at": "2026-05-01T00:00:00Z",
"sunset_at": "2026-08-01T00:00:00Z",
"replacement_uri": "https://registry.example.com/skills/query-v3",
"breaking_changes": [
{ "type": "parameter_renamed", "field": "query", "replacement": "sql" }
]
}
Runtime responses should include Deprecation and Sunset HTTP headers (RFC 8594).
Part 6: Security & Governance
Tool access is the primary attack surface for agentic systems. The OWASP LLM06 category — Excessive Agency — specifically addresses agents performing actions beyond what's intended.
The Least-Privilege Tool Model
Default: No tools
Explicit allowlist → scoped capabilities → human-in-the-loop for destructive actions
Controls by Layer
| Layer | Technology | What It Controls |
|---|---|---|
| Execution | Firecracker, Kata, gVisor, Docker + seccomp | Where tools execute |
| Tool Policy | Allowlist/denylist, RBAC | Which tools exist at all |
| Approval | Human-in-the-loop, CIBA push | Whether high-risk tools proceed |
Microsoft Agent Governance Toolkit
from agent_governance import safe_tool, PolicyEngine
policy = """
tools:
send_email: allow # Requires approval from policy
query_database: allow
exec_shell: deny
"""
@safe_tool(policy=policy)
def query_database(sql: str) -> str:
return db.execute(sql)
The toolkit provides 10/10 OWASP Agentic Top 10 coverage across four privilege rings, with MCP security gateway (tool poisoning detection) and Merkle audit trails.
AgentWard Lifecycle Architecture
A multi-layer security architecture for production agents:
Input → Layer 1: Prompt injection filter
→ Layer 2: Memory update monitoring
→ Layer 3: Decision alignment (proposed action vs user intent)
→ Layer 4: Execution control (tool permissions + sandbox + approval)
Accumulated risk context flows between layers for progressive escalation — an agent that passes all filters at layer 1 may still be blocked at layer 4 if its proposed action chain looks suspicious.
Risk Tier Framework
| Tier | Actions | Default Control |
|---|---|---|
| Read-only | Search docs, read files | Allow with logging |
| Draft | Create proposed content | Allow, do not apply |
| Internal write | Update test records | Allow in sandbox only |
| Destructive | Send email, delete data, charge cards | Require explicit human approval |
Part 7: The Skill Ecosystem — Registries & Marketplaces
The open-source community has built multiple skill registries, each with a different philosophy:
| Registry | Scale | Key Innovation |
|---|---|---|
| SkillHub | 100K+ skills | CLI install, security scanning, self-hostable |
| SkillsHub.wtf | 10K+ skills | Natural language skill resolver, 250x token savings |
| AgentSkillExchange | 1.3K verified skills | Trust tiers (security reviewed, top starred) |
| AgentVerse | Universal marketplace | Rust core, pgvector semantic search, MCP-native |
| AgentHub | Framework-agnostic | "Hugging Face for AI Agents", AgentSpec standard |
| MCP Catalog | Enterprise focus | Health monitoring, schema aggregation |
| OpenFihris | Cross-framework | Claude Code, LangChain, CrewAI, Google ADK compatible |
Most registries follow the Agent Skills specification (agentskills.io): SKILL.md files with YAML frontmatter, installable via npx skills add org/repo --skill <slug>, supported across Claude Code, Cursor, Windsurf, GitHub Copilot, Cline, Codex, and 30+ agent tools.
Part 8: Production Patterns
Pattern 1: Deferred Loading
For agents with 50+ tools, defer loading to prevent context overflow:
# Only tool names + descriptions loaded at startup
# Full schemas and handlers loaded when model opts in
tool_registry = ToolRegistry()
tool_registry.register(create_account_tool, namespace="user", defer=True)
tool_registry.register(delete_account_tool, namespace="user", defer=True)
tool_registry.register(search_analytics_tool, namespace="analytics", defer=True)
# Model calls ToolSearchTool() → registry activates relevant namespace
Pattern 2: Multi-Server Tool Coordination
When tools from different sources share names, use composite keys:
registry.get("github::createPullRequest")
registry.get("gitlab::createPullRequest")
registry.get_unified_tools() # Deduplicates with server prefixes
Pattern 3: Tool Fallback & Error Recovery
@function_tool(timeoutMs=5000, timeoutBehavior="error_as_result")
def unreliable_api(param: str) -> str:
"""External API with fallback. On timeout: returns model-readable error."""
try:
return call_external_api(param)
except TimeoutError:
return "Service temporarily unavailable. Try using cached data."
Pattern 4: Context Window Management via Middleware Stack
LangChain's middleware ordering reveals the skill management hierarchy:
1. Custom system prompt
2. Base agent prompt
3. To-do list prompt
4. Memory prompt
5. Skills prompt (skill locations + frontmatter only)
6. Virtual filesystem prompt
7. Subagent prompt
8. Human-in-the-loop prompt
Skills are injected after memory but before filesystem — they describe what the agent can do, not the data it operates on.
The Road Ahead
Several trends are converging:
-
Protocol standardization — MCP for tool connectivity, ACP for inter-agent communication, with A2A (Google) as a complementary alternative. The ecosystem is settling on specialization rather than competition.
-
Registry interoperability — The Agent Skills spec (
agentskills.io) is gaining traction as a universal format, with most major tools supporting theSKILL.mdconvention. -
Runtime governance — Tools like Microsoft's Agent Governance Toolkit and AgentWard are moving security from bolted-on policies to instrumentation built into the skill execution pipeline.
-
Deferred everything — Progressive disclosure (loading only what the model needs when it needs it) is becoming the default pattern for large-scale agent deployments.
-
Versioning as infrastructure — Skill versioning with semantic deprecation lifecycles is moving from best practice to operational requirement as skills become long-lived production artifacts.
The fundamental insight across all of these patterns is the same: skills are the agent's interface to the world. How you define, discover, govern, and version them determines what your agents can actually accomplish — and what they can break.
Further Reading
- Agentic AI in Practice — Comprehensive overview of agent architecture, protocols, and the 2026 framework ecosystem
- MCP Servers: The Future of AI Integration — Deep dive into the Model Context Protocol
- Multi-Agent Distributed Systems — Coordination patterns for multi-agent architectures
- MCP Specification — Official MCP spec
- ACP Protocol — Agent Communication Protocol documentation
- Agent Skills Specification — Universal skill format standard