Microsoft Agent Governance Toolkit: Runtime Security for AI Agents | Thoughts & Talks

On 2 April 2026, Microsoft published the Agent Governance Toolkit (AGT) on GitHub under the MIT licence. In a landscape where AI agents routinely hold API keys, shell access, and database credentials, AGT treats agent governance as a systems problem, not a prompt engineering one. The toolkit enforces security policies at runtime with the same rigour an operating system applies to process isolation: ring-based execution, kernel-level policy evaluation, and zero-trust identity between agents. Led by Imran Siddique, Principal Group Engineering Manager at Microsoft, the project landed at version 3.0.1 with 60 contributors and roughly 1,040 stars within its first two weeks.

The problem with prompt-based safety

Prompt-based guardrails have a fundamental flaw: they ask a language model to police itself. Microsoft's own benchmarks quantify the gap. When safety rules are embedded in system prompts, the violation rate sits at 26.67%. The model can be socially engineered, distracted by complex instructions, or simply fails to follow rules under adversarial pressure. Every major prompt injection demonstration, from Simon Willison's early experiments to the "Comment and Control" attacks on Claude Code and GitHub Copilot, exploits this weakness.

AGT sidesteps the problem entirely. Rather than asking the model to behave, it enforces boundaries in the execution layer. Policies are evaluated by a dedicated policy kernel before any tool call reaches the target system. The result: a 0.00% violation rate across the same benchmark suite.

Architecture

The toolkit ships as seven composable packages, each addressing a distinct layer of the agent governance stack. The design mirrors operating system architecture: a policy kernel at the bottom, execution isolation in the middle, and higher-order services on top.

Agent OS

The policy kernel at the bottom of the stack. Supports three policy formats: YAML for simple allow/deny rules, OPA/Rego for complex conditional logic, and Cedar (AWS's policy language) for attribute-based access control. Policies are loaded at startup and evaluated on every tool invocation, with zero dependence on the LLM for enforcement. The kernel is stateless and deterministic, making its behaviour fully predictable regardless of model behaviour.

Agent Runtime

Implements a four-ring execution model borrowed from CPU privilege architecture. Agents start in Ring 3 with minimal access and earn higher privileges through demonstrated trustworthiness:

Ring	Trust Threshold	Capabilities
Ring 0 (Kernel)	Score 900+	Full system access, policy modification
Ring 1 (Supervisor)	Score 700+	Cross-agent coordination, elevated tool access
Ring 2 (User)	Score 400+	Standard tool calls, bounded data access
Ring 3 (Untrusted)	Below 400	Sandbox only, output formatting

Each ring enforces per-agent resource limits: maximum execution time, memory caps, CPU throttling, and request rate limits. An agent in Ring 1 cannot escalate to Ring 0 without an explicit policy grant, and the kill switch terminates a violating agent within 50 milliseconds.

AgentMesh

Provides zero-trust identity between agents using decentralised identifiers (DIDs) signed with Ed25519, with ML-DSA-65 (NIST post-quantum) for key exchange. Agents verify each other's identity and permissions through the Inter-Agent Trust Protocol (IATP) before any inter-agent communication. A dynamic trust score on a 0-to-1000 scale across five behavioural tiers replaces the binary trusted/untrusted model, so agents that deviate from expected patterns lose privileges automatically.

Agent SRE

Applies production reliability engineering patterns to agent workloads: circuit breakers, SLOs, error budgets, and Prometheus-compatible metrics. When an agent enters a failure loop, whether through repeated tool calls, escalating permission requests, or excessive output generation, the circuit breaker trips and the agent is isolated from the mesh.

Agent Compliance, Marketplace, and Lightning

The three upper-layer packages extend the governance surface. Agent Compliance automates regulatory mapping to the EU AI Act, HIPAA, and SOC2, producing signed attestations. Agent Marketplace manages plugin lifecycles with Ed25519 signing and SLSA provenance verification. Agent Lightning handles reinforcement learning governance by wrapping training runners in policy-enforced boundaries, ensuring that RL-trained agents cannot exceed their capability envelope during training.

OWASP Agentic Top 10 compliance

OWASP published the Top 10 for Agentic Applications in December 2025, the first formal taxonomy of risks specific to autonomous AI agents. AGT maps directly to all ten categories:

#	OWASP Risk	AGT Control
ASI-01	Agent Goal Hijacking	Semantic intent classifier in the policy engine
ASI-02	Excessive Capabilities	Capability model enforces least-privilege per agent
ASI-03	Identity and Privilege Abuse	Zero-trust DID identity with Ed25519 and ML-DSA-65
ASI-04	Supply Chain Vulnerabilities	Ed25519 plugin signing, AI-BOM v2.0, SLSA provenance
ASI-05	Uncontrolled Code Execution	Execution rings with per-ring resource limits
ASI-06	Memory and Context Poisoning	Cross-Model Verification Kernel with majority voting
ASI-07	Unsafe Inter-Agent Communication	IATP encrypted channels with trust gates
ASI-08	Cascading Failures	Circuit breakers, SLO enforcement, error budgets
ASI-09	Human-Agent Trust Deficit	Full audit trails, flight recorder, approval workflows
ASI-10	Rogue Agents	Kill switch, ring isolation, trust decay, anomaly detection

An automated certification CLI produces a signed attestation mapping each control on every deployment, giving teams auditable evidence of compliance without manual documentation effort.

Performance

The policy kernel is designed for the hot path. Benchmark results from the published evaluation suite:

0.011ms per single policy evaluation (p50)
0.103ms for a full kernel evaluation across all active policies
47,000 operations per second at 1,000 concurrent agents
10,000x faster than inserting a policy check as an LLM call

The sub-millisecond overhead means governance adds less latency than a typical network round-trip. One early adopter reported 473 unauthorised actions blocked over 11 days with a total enforcement overhead of 0.43 seconds. The three-gate defence model (GovernanceGate, TrustGate, ReliabilityGate) processes each action through pattern matching, trust scoring, and circuit breaker evaluation before any execution proceeds.

Getting started

pip install "agent-governance-toolkit[full]"
```text

```python
from agt import PolicyEvaluator, PolicyDocument, PolicyRule, Effect

policy = PolicyDocument(rules=[
    PolicyRule(
        id="no-shell-access",
        effect=Effect.DENY,
        action="bash:*",
        condition="agent.ring > 1"
    ),
    PolicyRule(
        id="read-only-database",
        effect=Effect.DENY,
        action="sql:write",
        resource="production_*"
    )
])

evaluator = PolicyEvaluator(policy)
result = evaluator.evaluate(
    agent_id="code-reviewer",
    action="bash:rm -rf /",
    resource="/tmp/cache",
    context={"ring": 1}
)
# result.allowed == False
# result.matched_rule == "no-shell-access"
```text

Five SDKs are available: Python (full stack), TypeScript, Go, .NET, and Rust (in development). All five implement core governance covering policy evaluation, identity verification, trust scoring, and audit logging. The toolkit integrates with over 20 agent frameworks, including LangChain, AutoGen, CrewAI, Semantic Kernel, and LlamaIndex, hooking into native extension points so governance sits transparently between the framework and the actions agents take.

## Competitive landscape

AGT enters a space that has been fragmented. NeMo Guardrails (NVIDIA) and Llama Guard (Meta) take a prompt-centric approach, relying on model-based classification to catch policy violations. Guardrails AI and Lakera focus on input/output filtering. Bedrock Guardrails provides AWS-native policy enforcement. None of these address the full agent governance surface: identity, inter-agent communication, execution isolation, supply chain security, and reliability engineering in a single stack.

What distinguishes AGT is the operating system analogy. Governance runs as a kernel concern, not an application concern. The deterministic policy engine means violations are caught before execution, not detected after the fact. Microsoft has signalled its intention to move the project to a foundation governance model, engaging with the OWASP Agent Security Initiative, the LF AI and Data Foundation, and CoSAI working groups, a necessary step if governance tooling is to earn industry-wide trust.

The toolkit is available at [github.com/microsoft/agent-governance-toolkit](https://github.com/microsoft/agent-governance-toolkit).