Microsoft Agent Governance Toolkit: Runtime Security for AI Agents
· ~7 min readOn 2 April 2026, Microsoft published the Agent Governance Toolkit (AGT) on GitHub under the MIT licence. In a landscape where AI agents routinely hold API keys, shell access, and database credentials, AGT treats agent governance as a systems problem, not a prompt engineering one. The toolkit enforces security policies at runtime with the same rigour an operating system applies to process isolation: ring-based execution, kernel-level policy evaluation, and zero-trust identity between agents. Led by Imran Siddique, Principal Group Engineering Manager at Microsoft, the project landed at version 3.0.1 with 60 contributors and roughly 1,040 stars within its first two weeks.
The problem with prompt-based safety
Prompt-based guardrails have a fundamental flaw: they ask a language model to police itself. Microsoft's own benchmarks quantify the gap. When safety rules are embedded in system prompts, the violation rate sits at 26.67%. The model can be socially engineered, distracted by complex instructions, or simply fails to follow rules under adversarial pressure. Every major prompt injection demonstration, from Simon Willison's early experiments to the "Comment and Control" attacks on Claude Code and GitHub Copilot, exploits this weakness.
AGT sidesteps the problem entirely. Rather than asking the model to behave, it enforces boundaries in the execution layer. Policies are evaluated by a dedicated policy kernel before any tool call reaches the target system. The result: a 0.00% violation rate across the same benchmark suite.
Architecture
The toolkit ships as seven composable packages, each addressing a distinct layer of the agent governance stack. The design mirrors operating system architecture: a policy kernel at the bottom, execution isolation in the middle, and higher-order services on top.
Agent OS
The policy kernel at the bottom of the stack. Supports three policy formats: YAML for simple allow/deny rules, OPA/Rego for complex conditional logic, and Cedar (AWS's policy language) for attribute-based access control. Policies are loaded at startup and evaluated on every tool invocation, with zero dependence on the LLM for enforcement. The kernel is stateless and deterministic, making its behaviour fully predictable regardless of model behaviour.
Agent Runtime
Implements a four-ring execution model borrowed from CPU privilege architecture. Agents start in Ring 3 with minimal access and earn higher privileges through demonstrated trustworthiness:
| Ring | Trust Threshold | Capabilities |
|---|---|---|
| Ring 0 (Kernel) | Score 900+ | Full system access, policy modification |
| Ring 1 (Supervisor) | Score 700+ | Cross-agent coordination, elevated tool access |
| Ring 2 (User) | Score 400+ | Standard tool calls, bounded data access |
| Ring 3 (Untrusted) | Below 400 | Sandbox only, output formatting |
Each ring enforces per-agent resource limits: maximum execution time, memory caps, CPU throttling, and request rate limits. An agent in Ring 1 cannot escalate to Ring 0 without an explicit policy grant, and the kill switch terminates a violating agent within 50 milliseconds.
AgentMesh
Provides zero-trust identity between agents using decentralised identifiers (DIDs) signed with Ed25519, with ML-DSA-65 (NIST post-quantum) for key exchange. Agents verify each other's identity and permissions through the Inter-Agent Trust Protocol (IATP) before any inter-agent communication. A dynamic trust score on a 0-to-1000 scale across five behavioural tiers replaces the binary trusted/untrusted model, so agents that deviate from expected patterns lose privileges automatically.
Agent SRE
Applies production reliability engineering patterns to agent workloads: circuit breakers, SLOs, error budgets, and Prometheus-compatible metrics. When an agent enters a failure loop, whether through repeated tool calls, escalating permission requests, or excessive output generation, the circuit breaker trips and the agent is isolated from the mesh.
Agent Compliance, Marketplace, and Lightning
The three upper-layer packages extend the governance surface. Agent Compliance automates regulatory mapping to the EU AI Act, HIPAA, and SOC2, producing signed attestations. Agent Marketplace manages plugin lifecycles with Ed25519 signing and SLSA provenance verification. Agent Lightning handles reinforcement learning governance by wrapping training runners in policy-enforced boundaries, ensuring that RL-trained agents cannot exceed their capability envelope during training.
OWASP Agentic Top 10 compliance
OWASP published the Top 10 for Agentic Applications in December 2025, the first formal taxonomy of risks specific to autonomous AI agents. AGT maps directly to all ten categories:
| # | OWASP Risk | AGT Control |
|---|---|---|
| ASI-01 | Agent Goal Hijacking | Semantic intent classifier in the policy engine |
| ASI-02 | Excessive Capabilities | Capability model enforces least-privilege per agent |
| ASI-03 | Identity and Privilege Abuse | Zero-trust DID identity with Ed25519 and ML-DSA-65 |
| ASI-04 | Supply Chain Vulnerabilities | Ed25519 plugin signing, AI-BOM v2.0, SLSA provenance |
| ASI-05 | Uncontrolled Code Execution | Execution rings with per-ring resource limits |
| ASI-06 | Memory and Context Poisoning | Cross-Model Verification Kernel with majority voting |
| ASI-07 | Unsafe Inter-Agent Communication | IATP encrypted channels with trust gates |
| ASI-08 | Cascading Failures | Circuit breakers, SLO enforcement, error budgets |
| ASI-09 | Human-Agent Trust Deficit | Full audit trails, flight recorder, approval workflows |
| ASI-10 | Rogue Agents | Kill switch, ring isolation, trust decay, anomaly detection |
An automated certification CLI produces a signed attestation mapping each control on every deployment, giving teams auditable evidence of compliance without manual documentation effort.
Performance
The policy kernel is designed for the hot path. Benchmark results from the published evaluation suite:
- 0.011ms per single policy evaluation (p50)
- 0.103ms for a full kernel evaluation across all active policies
- 47,000 operations per second at 1,000 concurrent agents
- 10,000x faster than inserting a policy check as an LLM call
The sub-millisecond overhead means governance adds less latency than a typical network round-trip. One early adopter reported 473 unauthorised actions blocked over 11 days with a total enforcement overhead of 0.43 seconds. The three-gate defence model (GovernanceGate, TrustGate, ReliabilityGate) processes each action through pattern matching, trust scoring, and circuit breaker evaluation before any execution proceeds.
Getting started
pip install "agent-governance-toolkit[full]"
```text
```python
from agt import PolicyEvaluator, PolicyDocument, PolicyRule, Effect
policy = PolicyDocument(rules=[
PolicyRule(
id="no-shell-access",
effect=Effect.DENY,
action="bash:*",
condition="agent.ring > 1"
),
PolicyRule(
id="read-only-database",
effect=Effect.DENY,
action="sql:write",
resource="production_*"
)
])
evaluator = PolicyEvaluator(policy)
result = evaluator.evaluate(
agent_id="code-reviewer",
action="bash:rm -rf /",
resource="/tmp/cache",
context={"ring": 1}
)
# result.allowed == False
# result.matched_rule == "no-shell-access"
```text
Five SDKs are available: Python (full stack), TypeScript, Go, .NET, and Rust (in development). All five implement core governance covering policy evaluation, identity verification, trust scoring, and audit logging. The toolkit integrates with over 20 agent frameworks, including LangChain, AutoGen, CrewAI, Semantic Kernel, and LlamaIndex, hooking into native extension points so governance sits transparently between the framework and the actions agents take.
## Competitive landscape
AGT enters a space that has been fragmented. NeMo Guardrails (NVIDIA) and Llama Guard (Meta) take a prompt-centric approach, relying on model-based classification to catch policy violations. Guardrails AI and Lakera focus on input/output filtering. Bedrock Guardrails provides AWS-native policy enforcement. None of these address the full agent governance surface: identity, inter-agent communication, execution isolation, supply chain security, and reliability engineering in a single stack.
What distinguishes AGT is the operating system analogy. Governance runs as a kernel concern, not an application concern. The deterministic policy engine means violations are caught before execution, not detected after the fact. Microsoft has signalled its intention to move the project to a foundation governance model, engaging with the OWASP Agent Security Initiative, the LF AI and Data Foundation, and CoSAI working groups, a necessary step if governance tooling is to earn industry-wide trust.
The toolkit is available at [github.com/microsoft/agent-governance-toolkit](https://github.com/microsoft/agent-governance-toolkit).