Agent Skill Management: Tools, Skills, and Capabilities in the 2026 AI Ecosystem | AI

An agent without tools is just a chat model with extra steps. The ability to call APIs, execute code, query databases, and manipulate files is what transforms a language model from a static oracle into an autonomous worker. But as the agent ecosystem exploded in 2025-2026, so did the number of ways to define, register, discover, and govern those capabilities.

This guide maps the 2026 landscape of agent skill management — how different frameworks approach tools and skills, what protocols are standardizing, and what patterns production teams are adopting.

Part 1: The Core Abstraction — What Is a Skill?

Before comparing frameworks, it helps to define what we're actually talking about. Every agentic system needs a way to give the model access to external capabilities. The abstraction varies by name — tools, skills, functions, capabilities — but the core contract is the same:

A name: A unique identifier the model uses to request the capability
A description: Plain-text documentation of what the capability does (this is the most important factor for model performance)
An input schema: Structured parameters the model must provide (typically JSON Schema)
An execution handler: The actual code that runs when the tool is invoked
An output: Structured data returned to the model

name + description + input_schema → model proposes call → handler executes → output returned → model continues

The nuance lies in how these five elements are defined, when they are discovered, and who controls which tools are available in which context.

Tools vs Skills

The ecosystem has settled on a loose distinction:

Concept	Scope	Example
Tool	A single, atomic function	`get_weather(city)`, `search_docs(query)`
Skill	A bundled capability with context, instructions, and possibly multiple tools	"Code review skill" that includes `git diff`, `run SAST`, `analyze results`

A skill is a higher-level abstraction — it doesn't just define what to call, but when, why, and how to use the tools together. This becomes critical when agents need to handle complex, multi-step tasks.

Part 2: MCP — The Protocol Layer for Tools

The Model Context Protocol (MCP), now at spec version 2026-07-28, has become the closest thing to a universal standard for tool connectivity. It defines three primitives that servers expose to clients:

┌─────────────┐     tools/list, resources/list, prompts/list     ┌──────────────┐
│   Client    │ ◄──────────────────────────────────────────────► │ MCP Server   │
│  (Agent)    │     tools/call, resources/read, prompts/get      │ (Tools/Data) │
└─────────────┘                                                   └──────────────┘

The Three Primitives

Primitive	Purpose	Lifecycle
Tools	Executable functions the model invokes	Stateless: call → result
Resources	Read-only contextual data (schemas, files, configs)	Stateful URI scheme
Prompts	Reusable interaction templates	Stateless: get → template

Dynamic Discovery

MCP's key innovation is dynamic discovery at connection time. A client never needs to pre-know a server's tools:

// Client connects and queries capabilities
interface ServerCapabilities {
  tools?: ListToolsResult | "dynamic";
  resources?: ListResourcesResult | "dynamic";
  prompts?: ListPromptsResult | "dynamic";
}

The server responds with a structured capability catalog that becomes part of the model's context:

{
  "tools": [
    {
      "name": "query_database",
      "description": "Execute read-only SQL queries",
      "inputSchema": {
        "type": "object",
        "properties": {
          "sql": { "type": "string", "description": "SELECT statement" }
        },
        "required": ["sql"]
      }
    }
  ]
}

Discovery Extensions

Beyond the handshake, MCP servers can expose capabilities via several mechanisms:

.well-known/mcp/server-card.json (SEP-1649) — Pre-connection capability probing for HTTP servers
.well-known/mcp (SEP-1960) — Manifest endpoint with endpoints, capabilities, and auth requirements
DNS TXT _mcp.{host} (IETF draft) — Fast-mode discovery for crawlers
MCP Registry — Community registry for public server discovery

Server Implementation

The Python FastMCP library offers the cleanest developer experience:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("analytics-server")

@mcp.tool()
def query_analytics(dataset: str, metric: str) -> str:
    """Query analytics data.
    Args:
        dataset: Dataset name (users, revenue, sessions)
        metric: Metric to retrieve
    """
    return f"{metric} for {dataset}: 42,000"

@mcp.resource("schema://{dataset}")
def dataset_schema(dataset: str) -> str:
    """Return schema for a dataset."""
    return f"Schema for {dataset}: timestamp, value, segment"

MCP is transport-agnostic: stdio for local processes, Streamable HTTP for remote servers (now recommended over SSE), and WebSocket for persistent connections.

Part 3: Framework-Specific Patterns

Each major framework wraps the tool abstraction with its own ergonomics, lifecycle, and extension points.

OpenAI Agents SDK

The SDK uses a decorator-based approach with three tool tiers:

from agents import Agent, function_tool

# Tier 1: Simple function tool
@function_tool
def get_weather(city: str) -> str:
    """Return weather for a city."""
    return f"The weather in {city} is sunny."

# Tier 2: Tool with governance
@function_tool(needs_approval=True, timeoutMs=5000)
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email. Requires human approval."""
    return email_client.send(to, subject, body)

# Tier 3: Deferred loading for large tool ecosystems
@function_tool(defer_loading=True, namespace="database")
def query_database(sql: str) -> str:
    """Run read-only SQL query. Deferred: only loaded when model opts in."""
    return db.execute(sql)

agent = Agent(
    name="DataAgent",
    instructions="You retrieve data.",
    tools=[get_weather, send_email, ToolSearchTool()],
    tool_use_behavior="run_llm_again"
)

Key features in the SDK:

needs_approval — Boolean or callable for human-in-the-loop gates
defer_loading + ToolSearchTool() — Only loads tools when the model needs them (critical for 100+ tool ecosystems)
timeoutMs / timeoutBehavior — Per-call timeout handling ("error_as_result" or "raise_exception")
Guardrails — input_guardrails / output_guardrails as validation middleware
Agents as tools — Expose an entire agent as a callable tool without full handoff

LangChain & Deep Agents

LangChain's approach is middleware-centric. Tools are defined with the @tool decorator and composed via a middleware stack:

from langchain.tools import tool
from langchain.agents import create_agent

@tool
def search_docs(query: str, max_results: int = 5) -> str:
    """Search internal documentation. Args: query — search terms."""
    return doc_search(query, max_results)

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[search_docs],
    middleware=[
        FilesystemMiddleware(),
        SummarizationMiddleware(),
        MemoryMiddleware(sources=["./AGENTS.md"]),
        SkillsMiddleware(sources=["./skills/"]),
        HumanInTheLoopMiddleware(interrupt_on=["send_email", "delete"]),
    ]
)

Skills in Deep Agents use progressive disclosure — only YAML frontmatter loads at startup; the full SKILL.md loads only when invoked:

---
name: code-review
description: Perform security-focused code review for pull requests
tags: [security, code-review, devops]
version: "2.1.0"
---

# Code Review Skill

## When to Use
Activate when asked to review code for security vulnerabilities...

## Workflow
1. Fetch the diff using `git diff`
2. Run SAST scanner
3. Flag critical issues

Three-level loading:

Discovery: Parse SKILL.md frontmatter at startup → inject into system prompt
Read: Agent invokes skill → read full SKILL.md via read_file
Execute: Agent follows instructions, reads supporting files only when needed

This saves significant context. SkillsHub.wtf reports 250x token savings compared to reading full SKILL.md files for all available skills.

CrewAI

CrewAI aggregates five capability types into a unified tool list:

from crewai import Agent, Task, Crew
from crewai.tools import SerperDevTool

agent = Agent(
    role="Research Analyst",
    goal="Analyze market trends",
    backstory="Senior analyst with 10 years experience",
    tools=[SerperDevTool()],        # Callable tools
    allow_delegation=True,           # Can delegate to other agents
    function_calling_llm=None        # Optional: separate LLM for tool calling
)

# Tasks can override the agent's default tools
task = Task(
    description="Research AI agent frameworks",
    tools=[SerperDevTool(), WebScraperTool()],  # Task-level override
    agent=agent
)

Capability	What It Provides	Resolution
Tools	Callable functions (web search, file ops)	`BaseTool` instance
MCPs	Remote tool servers via stdio/SSE/HTTP	`BaseTool` instance
Apps	Platform integrations (Gmail, Sheets)	`BaseTool` instance
Skills	Domain expertise in SKILL.md format	`BaseTool` instance
Knowledge	Retrieved facts from PDFs, CSVs	RAG-retrieved context

All five resolve to internal BaseTool instances in a unified tool list — the agent doesn't distinguish between a local function, a remote MCP server, or a bundled skill.

AutoGen / AG2

AG2 separates the agent that proposes a tool call from the agent that executes it:

from autogen import ConversableAgent, UserProxyAgent, register_function

def get_weather(city: str) -> str:
    return f"Weather in {city}: sunny"

date_agent = ConversableAgent(name="planner", llm_config=llm_config)
executor = UserProxyAgent(name="executor", human_input_mode="NEVER")

register_function(
    get_weather,
    caller=date_agent,         # Proposes the call
    executor=executor,         # Executes the call
    name="get_weather",
    description="Get weather for a given city"
)

Or with the decorator API:

@date_agent.register_for_llm(description="Get the day of the week")
@executor.register_for_execution()
def get_weekday(date: str) -> str:
    from datetime import datetime
    return datetime.strptime(date, '%Y-%m-%d').strftime('%A')

AG2 supports static registration (at agent init), dynamic registration (at runtime via tools=[]), and hybrid — core tools are static, situational tools are injected at runtime.

Anthropic / Claude SDK

Claude's Tool Runner abstraction handles the entire agentic loop:

from anthropic import Anthropic
from anthropic._models import BetaRunnableTool

client = Anthropic()

tools = [
    BetaRunnableTool(
        name="get_weather",
        description="Get weather for a city. Provide city name.",
        input_schema={
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        },
        run=lambda params, ctx: f"Weather in {params['city']}: sunny"
    )
]

with client.beta.messages.tool_runner(tools=tools) as runner:
    for message in runner.stream(text="What's the weather in Paris?"):
        print(message)

Claude also ships trained tool signatures — tool definitions that the model is specifically optimized to call correctly:

bash_code_execution — Sandboxed shell commands
text_editor_code_execution — View/create/edit files
web_search_20260209 — Web search
web_fetch_20260209 — Fetch URLs

Part 4: ACP — Agent-to-Agent Skill Advertising

While MCP connects models to tools, the Agent Communication Protocol (ACP) connects agents to other agents. Agents publish Agent Cards — structured JSON documents describing their capabilities:

{
  "name": "code-review-agent",
  "description": "Performs security-focused code review",
  "url": "https://api.company.com/acp/code-review",
  "capabilities": [
    { "name": "review_pr", "description": "Review a PR for security issues" }
  ],
  "skills": [
    { "name": "sast", "input_modes": ["text"], "output_modes": ["json"] }
  ],
  "limitations": [
    { "kind": "modality", "type": "audio", "description": "Does not process audio" }
  ]
}

Discovery Methods

Method	Description
Basic	Query running ACP servers directly
Open	Public manifest at `/.well-known/acp.json`
Registry	Centralized registry (online or offline cache)
Embedded	Metadata in container images for air-gapped deployments
LAN	TCP probe scan of /24 subnet in 1-3 seconds (v2.1-alpha)

ACP's skills API includes GET /skills for listing, POST /skills/query for filtered search, and PATCH /skills/{id}/limitations for runtime capability updates.

The relationship between the protocols is complementary:

         ┌─────────────────┐
         │  Orchestrator   │
         │     Agent       │
         └────┬─────────┬──┘
              │         │
         MCP  │         │  ACP
              │         │
     ┌────────▼──┐  ┌──▼─────────┐
     │   Tools   │  │  Sub-agent │
     │ (API/DB)  │  │ (Specialist)│
     └───────────┘  └────────────┘

MCP for connecting to external systems, ACP for connecting to other agents.

Part 5: Skill Versioning & Lifecycle

As skills become production artifacts, versioning and lifecycle management have emerged as critical concerns.

Semantic Versioning for Skills

The adapted SemVer for AI skills maps breaking changes to clearly defined patterns:

MAJOR — Breaking changes (schema rename, removed params, output shape change)
MINOR — Backward-compatible additions (new optional params, new tools)
PATCH — Bug fixes (typos, docstring corrections, schema validation fixes)

Deprecation Lifecycle

A recommended three-phase deprecation policy:

Phase 1: Deprecation Warning
  → Mark skill with @deprecated in description + provide replacement name

Phase 2: Soft Error
  → Tool still works but appends warning to output:
    {"result": "...", "warning": "Will be removed 2026-08-01. Migrate to v2."}

Phase 3: Hard Error
  → Remove from routing registry, return descriptive error with migration path

Runtime Manifest

Proposed standard for runtime version metadata (geodocs.dev):

{
  "version": "2.0.0",
  "version_scheme": "semver",
  "lifecycle_state": "deprecated",
  "deprecated_at": "2026-05-01T00:00:00Z",
  "sunset_at": "2026-08-01T00:00:00Z",
  "replacement_uri": "https://registry.example.com/skills/query-v3",
  "breaking_changes": [
    { "type": "parameter_renamed", "field": "query", "replacement": "sql" }
  ]
}

Runtime responses should include Deprecation and Sunset HTTP headers (RFC 8594).

Part 6: Security & Governance

Tool access is the primary attack surface for agentic systems. The OWASP LLM06 category — Excessive Agency — specifically addresses agents performing actions beyond what's intended.

The Least-Privilege Tool Model

Default: No tools
Explicit allowlist → scoped capabilities → human-in-the-loop for destructive actions

Controls by Layer

Layer	Technology	What It Controls
Execution	Firecracker, Kata, gVisor, Docker + seccomp	Where tools execute
Tool Policy	Allowlist/denylist, RBAC	Which tools exist at all
Approval	Human-in-the-loop, CIBA push	Whether high-risk tools proceed

Microsoft Agent Governance Toolkit

from agent_governance import safe_tool, PolicyEngine

policy = """
tools:
  send_email: allow  # Requires approval from policy
  query_database: allow
  exec_shell: deny
"""

@safe_tool(policy=policy)
def query_database(sql: str) -> str:
    return db.execute(sql)

The toolkit provides 10/10 OWASP Agentic Top 10 coverage across four privilege rings, with MCP security gateway (tool poisoning detection) and Merkle audit trails.

AgentWard Lifecycle Architecture

A multi-layer security architecture for production agents:

Input → Layer 1: Prompt injection filter
  → Layer 2: Memory update monitoring
    → Layer 3: Decision alignment (proposed action vs user intent)
      → Layer 4: Execution control (tool permissions + sandbox + approval)

Accumulated risk context flows between layers for progressive escalation — an agent that passes all filters at layer 1 may still be blocked at layer 4 if its proposed action chain looks suspicious.

Risk Tier Framework

Tier	Actions	Default Control
Read-only	Search docs, read files	Allow with logging
Draft	Create proposed content	Allow, do not apply
Internal write	Update test records	Allow in sandbox only
Destructive	Send email, delete data, charge cards	Require explicit human approval

Part 7: The Skill Ecosystem — Registries & Marketplaces

The open-source community has built multiple skill registries, each with a different philosophy:

Registry	Scale	Key Innovation
SkillHub	100K+ skills	CLI install, security scanning, self-hostable
SkillsHub.wtf	10K+ skills	Natural language skill resolver, 250x token savings
AgentSkillExchange	1.3K verified skills	Trust tiers (security reviewed, top starred)
AgentVerse	Universal marketplace	Rust core, pgvector semantic search, MCP-native
AgentHub	Framework-agnostic	"Hugging Face for AI Agents", AgentSpec standard
MCP Catalog	Enterprise focus	Health monitoring, schema aggregation
OpenFihris	Cross-framework	Claude Code, LangChain, CrewAI, Google ADK compatible

Most registries follow the Agent Skills specification (agentskills.io): SKILL.md files with YAML frontmatter, installable via npx skills add org/repo --skill <slug>, supported across Claude Code, Cursor, Windsurf, GitHub Copilot, Cline, Codex, and 30+ agent tools.

Part 8: Production Patterns

Pattern 1: Deferred Loading

For agents with 50+ tools, defer loading to prevent context overflow:

# Only tool names + descriptions loaded at startup
# Full schemas and handlers loaded when model opts in
tool_registry = ToolRegistry()
tool_registry.register(create_account_tool, namespace="user", defer=True)
tool_registry.register(delete_account_tool, namespace="user", defer=True)
tool_registry.register(search_analytics_tool, namespace="analytics", defer=True)

# Model calls ToolSearchTool() → registry activates relevant namespace

Pattern 2: Multi-Server Tool Coordination

When tools from different sources share names, use composite keys:

registry.get("github::createPullRequest")
registry.get("gitlab::createPullRequest")
registry.get_unified_tools()  # Deduplicates with server prefixes

Pattern 3: Tool Fallback & Error Recovery

@function_tool(timeoutMs=5000, timeoutBehavior="error_as_result")
def unreliable_api(param: str) -> str:
    """External API with fallback. On timeout: returns model-readable error."""
    try:
        return call_external_api(param)
    except TimeoutError:
        return "Service temporarily unavailable. Try using cached data."

Pattern 4: Context Window Management via Middleware Stack

LangChain's middleware ordering reveals the skill management hierarchy:

1. Custom system prompt
2. Base agent prompt
3. To-do list prompt
4. Memory prompt
5. Skills prompt (skill locations + frontmatter only)
6. Virtual filesystem prompt
7. Subagent prompt
8. Human-in-the-loop prompt

Skills are injected after memory but before filesystem — they describe what the agent can do, not the data it operates on.

The Road Ahead

Several trends are converging:

Protocol standardization — MCP for tool connectivity, ACP for inter-agent communication, with A2A (Google) as a complementary alternative. The ecosystem is settling on specialization rather than competition.
Registry interoperability — The Agent Skills spec (agentskills.io) is gaining traction as a universal format, with most major tools supporting the SKILL.md convention.
Runtime governance — Tools like Microsoft's Agent Governance Toolkit and AgentWard are moving security from bolted-on policies to instrumentation built into the skill execution pipeline.
Deferred everything — Progressive disclosure (loading only what the model needs when it needs it) is becoming the default pattern for large-scale agent deployments.
Versioning as infrastructure — Skill versioning with semantic deprecation lifecycles is moving from best practice to operational requirement as skills become long-lived production artifacts.

The fundamental insight across all of these patterns is the same: skills are the agent's interface to the world. How you define, discover, govern, and version them determines what your agents can actually accomplish — and what they can break.

Part 1: The Core Abstraction — What Is a Skill?

A name: A unique identifier the model uses to request the capability
A description: Plain-text documentation of what the capability does (this is the most important factor for model performance)
An input schema: Structured parameters the model must provide (typically JSON Schema)
An execution handler: The actual code that runs when the tool is invoked
An output: Structured data returned to the model

name + description + input_schema → model proposes call → handler executes → output returned → model continues

The nuance lies in how these five elements are defined, when they are discovered, and who controls which tools are available in which context.

Tools vs Skills

The ecosystem has settled on a loose distinction:

Concept	Scope	Example
Tool	A single, atomic function	`get_weather(city)`, `search_docs(query)`
Skill	A bundled capability with context, instructions, and possibly multiple tools	"Code review skill" that includes `git diff`, `run SAST`, `analyze results`

Part 2: MCP — The Protocol Layer for Tools

┌─────────────┐     tools/list, resources/list, prompts/list     ┌──────────────┐
│   Client    │ ◄──────────────────────────────────────────────► │ MCP Server   │
│  (Agent)    │     tools/call, resources/read, prompts/get      │ (Tools/Data) │
└─────────────┘                                                   └──────────────┘

The Three Primitives

Primitive	Purpose	Lifecycle
Tools	Executable functions the model invokes	Stateless: call → result
Resources	Read-only contextual data (schemas, files, configs)	Stateful URI scheme
Prompts	Reusable interaction templates	Stateless: get → template

Dynamic Discovery

MCP's key innovation is dynamic discovery at connection time. A client never needs to pre-know a server's tools:

// Client connects and queries capabilities
interface ServerCapabilities {
  tools?: ListToolsResult | "dynamic";
  resources?: ListResourcesResult | "dynamic";
  prompts?: ListPromptsResult | "dynamic";
}

The server responds with a structured capability catalog that becomes part of the model's context:

{
  "tools": [
    {
      "name": "query_database",
      "description": "Execute read-only SQL queries",
      "inputSchema": {
        "type": "object",
        "properties": {
          "sql": { "type": "string", "description": "SELECT statement" }
        },
        "required": ["sql"]
      }
    }
  ]
}

Discovery Extensions

Beyond the handshake, MCP servers can expose capabilities via several mechanisms:

.well-known/mcp/server-card.json (SEP-1649) — Pre-connection capability probing for HTTP servers
.well-known/mcp (SEP-1960) — Manifest endpoint with endpoints, capabilities, and auth requirements
DNS TXT _mcp.{host} (IETF draft) — Fast-mode discovery for crawlers
MCP Registry — Community registry for public server discovery

Server Implementation

The Python FastMCP library offers the cleanest developer experience:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("analytics-server")

@mcp.tool()
def query_analytics(dataset: str, metric: str) -> str:
    """Query analytics data.
    Args:
        dataset: Dataset name (users, revenue, sessions)
        metric: Metric to retrieve
    """
    return f"{metric} for {dataset}: 42,000"

@mcp.resource("schema://{dataset}")
def dataset_schema(dataset: str) -> str:
    """Return schema for a dataset."""
    return f"Schema for {dataset}: timestamp, value, segment"

MCP is transport-agnostic: stdio for local processes, Streamable HTTP for remote servers (now recommended over SSE), and WebSocket for persistent connections.

Part 3: Framework-Specific Patterns

Each major framework wraps the tool abstraction with its own ergonomics, lifecycle, and extension points.

OpenAI Agents SDK

The SDK uses a decorator-based approach with three tool tiers:

from agents import Agent, function_tool

# Tier 1: Simple function tool
@function_tool
def get_weather(city: str) -> str:
    """Return weather for a city."""
    return f"The weather in {city} is sunny."

# Tier 2: Tool with governance
@function_tool(needs_approval=True, timeoutMs=5000)
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email. Requires human approval."""
    return email_client.send(to, subject, body)

# Tier 3: Deferred loading for large tool ecosystems
@function_tool(defer_loading=True, namespace="database")
def query_database(sql: str) -> str:
    """Run read-only SQL query. Deferred: only loaded when model opts in."""
    return db.execute(sql)

agent = Agent(
    name="DataAgent",
    instructions="You retrieve data.",
    tools=[get_weather, send_email, ToolSearchTool()],
    tool_use_behavior="run_llm_again"
)

Key features in the SDK:

needs_approval — Boolean or callable for human-in-the-loop gates
defer_loading + ToolSearchTool() — Only loads tools when the model needs them (critical for 100+ tool ecosystems)
timeoutMs / timeoutBehavior — Per-call timeout handling ("error_as_result" or "raise_exception")
Guardrails — input_guardrails / output_guardrails as validation middleware
Agents as tools — Expose an entire agent as a callable tool without full handoff

LangChain & Deep Agents

LangChain's approach is middleware-centric. Tools are defined with the @tool decorator and composed via a middleware stack:

from langchain.tools import tool
from langchain.agents import create_agent

@tool
def search_docs(query: str, max_results: int = 5) -> str:
    """Search internal documentation. Args: query — search terms."""
    return doc_search(query, max_results)

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[search_docs],
    middleware=[
        FilesystemMiddleware(),
        SummarizationMiddleware(),
        MemoryMiddleware(sources=["./AGENTS.md"]),
        SkillsMiddleware(sources=["./skills/"]),
        HumanInTheLoopMiddleware(interrupt_on=["send_email", "delete"]),
    ]
)

Skills in Deep Agents use progressive disclosure — only YAML frontmatter loads at startup; the full SKILL.md loads only when invoked:

---
name: code-review
description: Perform security-focused code review for pull requests
tags: [security, code-review, devops]
version: "2.1.0"
---

# Code Review Skill

## When to Use
Activate when asked to review code for security vulnerabilities...

## Workflow
1. Fetch the diff using `git diff`
2. Run SAST scanner
3. Flag critical issues

Three-level loading:

Discovery: Parse SKILL.md frontmatter at startup → inject into system prompt
Read: Agent invokes skill → read full SKILL.md via read_file
Execute: Agent follows instructions, reads supporting files only when needed

This saves significant context. SkillsHub.wtf reports 250x token savings compared to reading full SKILL.md files for all available skills.

CrewAI

CrewAI aggregates five capability types into a unified tool list:

from crewai import Agent, Task, Crew
from crewai.tools import SerperDevTool

agent = Agent(
    role="Research Analyst",
    goal="Analyze market trends",
    backstory="Senior analyst with 10 years experience",
    tools=[SerperDevTool()],        # Callable tools
    allow_delegation=True,           # Can delegate to other agents
    function_calling_llm=None        # Optional: separate LLM for tool calling
)

# Tasks can override the agent's default tools
task = Task(
    description="Research AI agent frameworks",
    tools=[SerperDevTool(), WebScraperTool()],  # Task-level override
    agent=agent
)

Capability	What It Provides	Resolution
Tools	Callable functions (web search, file ops)	`BaseTool` instance
MCPs	Remote tool servers via stdio/SSE/HTTP	`BaseTool` instance
Apps	Platform integrations (Gmail, Sheets)	`BaseTool` instance
Skills	Domain expertise in SKILL.md format	`BaseTool` instance
Knowledge	Retrieved facts from PDFs, CSVs	RAG-retrieved context

All five resolve to internal BaseTool instances in a unified tool list — the agent doesn't distinguish between a local function, a remote MCP server, or a bundled skill.

AutoGen / AG2

AG2 separates the agent that proposes a tool call from the agent that executes it:

from autogen import ConversableAgent, UserProxyAgent, register_function

def get_weather(city: str) -> str:
    return f"Weather in {city}: sunny"

date_agent = ConversableAgent(name="planner", llm_config=llm_config)
executor = UserProxyAgent(name="executor", human_input_mode="NEVER")

register_function(
    get_weather,
    caller=date_agent,         # Proposes the call
    executor=executor,         # Executes the call
    name="get_weather",
    description="Get weather for a given city"
)

Or with the decorator API:

@date_agent.register_for_llm(description="Get the day of the week")
@executor.register_for_execution()
def get_weekday(date: str) -> str:
    from datetime import datetime
    return datetime.strptime(date, '%Y-%m-%d').strftime('%A')

AG2 supports static registration (at agent init), dynamic registration (at runtime via tools=[]), and hybrid — core tools are static, situational tools are injected at runtime.

Anthropic / Claude SDK

Claude's Tool Runner abstraction handles the entire agentic loop:

from anthropic import Anthropic
from anthropic._models import BetaRunnableTool

client = Anthropic()

tools = [
    BetaRunnableTool(
        name="get_weather",
        description="Get weather for a city. Provide city name.",
        input_schema={
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        },
        run=lambda params, ctx: f"Weather in {params['city']}: sunny"
    )
]

with client.beta.messages.tool_runner(tools=tools) as runner:
    for message in runner.stream(text="What's the weather in Paris?"):
        print(message)

Claude also ships trained tool signatures — tool definitions that the model is specifically optimized to call correctly:

bash_code_execution — Sandboxed shell commands
text_editor_code_execution — View/create/edit files
web_search_20260209 — Web search
web_fetch_20260209 — Fetch URLs

Part 4: ACP — Agent-to-Agent Skill Advertising

While MCP connects models to tools, the Agent Communication Protocol (ACP) connects agents to other agents. Agents publish Agent Cards — structured JSON documents describing their capabilities:

{
  "name": "code-review-agent",
  "description": "Performs security-focused code review",
  "url": "https://api.company.com/acp/code-review",
  "capabilities": [
    { "name": "review_pr", "description": "Review a PR for security issues" }
  ],
  "skills": [
    { "name": "sast", "input_modes": ["text"], "output_modes": ["json"] }
  ],
  "limitations": [
    { "kind": "modality", "type": "audio", "description": "Does not process audio" }
  ]
}

Discovery Methods

Method	Description
Basic	Query running ACP servers directly
Open	Public manifest at `/.well-known/acp.json`
Registry	Centralized registry (online or offline cache)
Embedded	Metadata in container images for air-gapped deployments
LAN	TCP probe scan of /24 subnet in 1-3 seconds (v2.1-alpha)

ACP's skills API includes GET /skills for listing, POST /skills/query for filtered search, and PATCH /skills/{id}/limitations for runtime capability updates.

The relationship between the protocols is complementary:

         ┌─────────────────┐
         │  Orchestrator   │
         │     Agent       │
         └────┬─────────┬──┘
              │         │
         MCP  │         │  ACP
              │         │
     ┌────────▼──┐  ┌──▼─────────┐
     │   Tools   │  │  Sub-agent │
     │ (API/DB)  │  │ (Specialist)│
     └───────────┘  └────────────┘

MCP for connecting to external systems, ACP for connecting to other agents.

Part 5: Skill Versioning & Lifecycle

As skills become production artifacts, versioning and lifecycle management have emerged as critical concerns.

Semantic Versioning for Skills

The adapted SemVer for AI skills maps breaking changes to clearly defined patterns:

MAJOR — Breaking changes (schema rename, removed params, output shape change)
MINOR — Backward-compatible additions (new optional params, new tools)
PATCH — Bug fixes (typos, docstring corrections, schema validation fixes)

Deprecation Lifecycle

A recommended three-phase deprecation policy:

Phase 1: Deprecation Warning
  → Mark skill with @deprecated in description + provide replacement name

Phase 2: Soft Error
  → Tool still works but appends warning to output:
    {"result": "...", "warning": "Will be removed 2026-08-01. Migrate to v2."}

Phase 3: Hard Error
  → Remove from routing registry, return descriptive error with migration path

Runtime Manifest

Proposed standard for runtime version metadata (geodocs.dev):

{
  "version": "2.0.0",
  "version_scheme": "semver",
  "lifecycle_state": "deprecated",
  "deprecated_at": "2026-05-01T00:00:00Z",
  "sunset_at": "2026-08-01T00:00:00Z",
  "replacement_uri": "https://registry.example.com/skills/query-v3",
  "breaking_changes": [
    { "type": "parameter_renamed", "field": "query", "replacement": "sql" }
  ]
}

Runtime responses should include Deprecation and Sunset HTTP headers (RFC 8594).

Part 6: Security & Governance

Tool access is the primary attack surface for agentic systems. The OWASP LLM06 category — Excessive Agency — specifically addresses agents performing actions beyond what's intended.

The Least-Privilege Tool Model

Default: No tools
Explicit allowlist → scoped capabilities → human-in-the-loop for destructive actions

Controls by Layer

Layer	Technology	What It Controls
Execution	Firecracker, Kata, gVisor, Docker + seccomp	Where tools execute
Tool Policy	Allowlist/denylist, RBAC	Which tools exist at all
Approval	Human-in-the-loop, CIBA push	Whether high-risk tools proceed

Microsoft Agent Governance Toolkit

from agent_governance import safe_tool, PolicyEngine

policy = """
tools:
  send_email: allow  # Requires approval from policy
  query_database: allow
  exec_shell: deny
"""

@safe_tool(policy=policy)
def query_database(sql: str) -> str:
    return db.execute(sql)

The toolkit provides 10/10 OWASP Agentic Top 10 coverage across four privilege rings, with MCP security gateway (tool poisoning detection) and Merkle audit trails.

AgentWard Lifecycle Architecture

A multi-layer security architecture for production agents:

Input → Layer 1: Prompt injection filter
  → Layer 2: Memory update monitoring
    → Layer 3: Decision alignment (proposed action vs user intent)
      → Layer 4: Execution control (tool permissions + sandbox + approval)

Accumulated risk context flows between layers for progressive escalation — an agent that passes all filters at layer 1 may still be blocked at layer 4 if its proposed action chain looks suspicious.

Risk Tier Framework

Tier	Actions	Default Control
Read-only	Search docs, read files	Allow with logging
Draft	Create proposed content	Allow, do not apply
Internal write	Update test records	Allow in sandbox only
Destructive	Send email, delete data, charge cards	Require explicit human approval

Part 7: The Skill Ecosystem — Registries & Marketplaces

The open-source community has built multiple skill registries, each with a different philosophy:

Registry	Scale	Key Innovation
SkillHub	100K+ skills	CLI install, security scanning, self-hostable
SkillsHub.wtf	10K+ skills	Natural language skill resolver, 250x token savings
AgentSkillExchange	1.3K verified skills	Trust tiers (security reviewed, top starred)
AgentVerse	Universal marketplace	Rust core, pgvector semantic search, MCP-native
AgentHub	Framework-agnostic	"Hugging Face for AI Agents", AgentSpec standard
MCP Catalog	Enterprise focus	Health monitoring, schema aggregation
OpenFihris	Cross-framework	Claude Code, LangChain, CrewAI, Google ADK compatible

Part 8: Production Patterns

Pattern 1: Deferred Loading

For agents with 50+ tools, defer loading to prevent context overflow:

# Only tool names + descriptions loaded at startup
# Full schemas and handlers loaded when model opts in
tool_registry = ToolRegistry()
tool_registry.register(create_account_tool, namespace="user", defer=True)
tool_registry.register(delete_account_tool, namespace="user", defer=True)
tool_registry.register(search_analytics_tool, namespace="analytics", defer=True)

# Model calls ToolSearchTool() → registry activates relevant namespace

Pattern 2: Multi-Server Tool Coordination

When tools from different sources share names, use composite keys:

registry.get("github::createPullRequest")
registry.get("gitlab::createPullRequest")
registry.get_unified_tools()  # Deduplicates with server prefixes

Pattern 3: Tool Fallback & Error Recovery

@function_tool(timeoutMs=5000, timeoutBehavior="error_as_result")
def unreliable_api(param: str) -> str:
    """External API with fallback. On timeout: returns model-readable error."""
    try:
        return call_external_api(param)
    except TimeoutError:
        return "Service temporarily unavailable. Try using cached data."

Pattern 4: Context Window Management via Middleware Stack

LangChain's middleware ordering reveals the skill management hierarchy:

1. Custom system prompt
2. Base agent prompt
3. To-do list prompt
4. Memory prompt
5. Skills prompt (skill locations + frontmatter only)
6. Virtual filesystem prompt
7. Subagent prompt
8. Human-in-the-loop prompt

Skills are injected after memory but before filesystem — they describe what the agent can do, not the data it operates on.

The Road Ahead

Several trends are converging:

Protocol standardization — MCP for tool connectivity, ACP for inter-agent communication, with A2A (Google) as a complementary alternative. The ecosystem is settling on specialization rather than competition.
Registry interoperability — The Agent Skills spec (agentskills.io) is gaining traction as a universal format, with most major tools supporting the SKILL.md convention.
Runtime governance — Tools like Microsoft's Agent Governance Toolkit and AgentWard are moving security from bolted-on policies to instrumentation built into the skill execution pipeline.
Deferred everything — Progressive disclosure (loading only what the model needs when it needs it) is becoming the default pattern for large-scale agent deployments.
Versioning as infrastructure — Skill versioning with semantic deprecation lifecycles is moving from best practice to operational requirement as skills become long-lived production artifacts.

Part 1: The Core Abstraction — What Is a Skill?

Tools vs Skills

Part 2: MCP — The Protocol Layer for Tools

The Three Primitives

Dynamic Discovery

Discovery Extensions

Server Implementation

Part 3: Framework-Specific Patterns

OpenAI Agents SDK

LangChain & Deep Agents

CrewAI

AutoGen / AG2

Anthropic / Claude SDK

Part 4: ACP — Agent-to-Agent Skill Advertising

Discovery Methods

Part 5: Skill Versioning & Lifecycle

Semantic Versioning for Skills

Deprecation Lifecycle

Runtime Manifest

Part 6: Security & Governance

The Least-Privilege Tool Model

Controls by Layer

Microsoft Agent Governance Toolkit

AgentWard Lifecycle Architecture

Risk Tier Framework

Part 7: The Skill Ecosystem — Registries & Marketplaces

Part 8: Production Patterns

Pattern 1: Deferred Loading

Pattern 2: Multi-Server Tool Coordination

Pattern 3: Tool Fallback & Error Recovery

Pattern 4: Context Window Management via Middleware Stack

The Road Ahead

Further Reading

Never miss a deep-dive

Part 1: The Core Abstraction — What Is a Skill?

Tools vs Skills

Part 2: MCP — The Protocol Layer for Tools

The Three Primitives

Dynamic Discovery

Discovery Extensions

Server Implementation

Part 3: Framework-Specific Patterns

OpenAI Agents SDK

LangChain & Deep Agents

CrewAI

AutoGen / AG2

Anthropic / Claude SDK

Part 4: ACP — Agent-to-Agent Skill Advertising

Discovery Methods

Part 5: Skill Versioning & Lifecycle

Semantic Versioning for Skills

Deprecation Lifecycle

Runtime Manifest

Part 6: Security & Governance

The Least-Privilege Tool Model

Controls by Layer

Microsoft Agent Governance Toolkit

AgentWard Lifecycle Architecture

Risk Tier Framework

Part 7: The Skill Ecosystem — Registries & Marketplaces

Part 8: Production Patterns

Pattern 1: Deferred Loading

Pattern 2: Multi-Server Tool Coordination

Pattern 3: Tool Fallback & Error Recovery

Pattern 4: Context Window Management via Middleware Stack

The Road Ahead

Further Reading

Never miss a deep-dive