OpenSpec vs. the Specification Framework Landscape: Why DevOps Needs AI-Native Specs
~15 min readEvery DevOps team maintains specifications. API contracts, architecture decision records, design documents, runbooks, change proposals. The question is not whether you spec — it's whether your specs survive contact with reality.
Most specification frameworks were designed for human-to-human communication. They assume a reader who can infer intent from prose, fill in gaps from context, and navigate cross-references intuitively. That assumption breaks down when the primary consumer of your specs is no longer human.
We are entering an era where AI agents write, review, and deploy code alongside humans. The specification frameworks we choose determine whether those agents work with us or against us.
This article compares six approaches to specification and change management — including OpenSpec, the AI-native framework used in this project — across practical DevOps dimensions: agent-friendliness, CI/CD integration, documentation drift, and operational overhead.
The Specification Spectrum
Before comparing individual frameworks, it helps to understand what a specification framework actually provides:
┌─────────────────────────────────────────────────────────────────┐
│ WHAT SPECS DO │
├─────────────────────────────────────────────────────────────────┤
│ │
│ INTENT CAPTURE ──► DESIGN DECISIONS ──► IMPLEMENTATION │
│ (what & why) (how) (ticked boxes) │
│ │
│ Documentation │
│ Drift grows here ←───────────────────────────────────────────► │
│ (spec frozen) (code evolves) │
│ │
└─────────────────────────────────────────────────────────────────┘
Every framework addresses a subset of this pipeline. The difference is where they place the automation boundary — what they encode in machine-readable formats versus what they leave to human interpretation.
The Contenders
| Framework | Primary Format | Consumer | Scope | AI-Native? |
|---|---|---|---|---|
| OpenAPI/Swagger | YAML/JSON | Tools + Humans | API contracts | Partial |
| AsyncAPI | YAML/JSON | Tools + Humans | Event contracts | Partial |
| ADRs | Markdown | Humans | Architecture decisions | No |
| RFC Process | Markdown | Humans (team) | Design proposals | No |
| BDD/Gherkin | Plain text (Gherkin) | Tests + Humans | Behavior specs | Yes (structured) |
| OpenSpec | Markdown + CLI | AI Agents + Humans | Full change lifecycle | Yes (native) |
| GitHub Issues/Projects | UI + Markdown | Humans | Task tracking | No |
| JIRA + Confluence | UI + Rich text | Humans (org) | Project management | No |
OpenAPI / Swagger: The Gold Standard That Solved One Problem
OpenAPI is the most successful specification framework in DevOps. It defined a machine-readable contract format for REST APIs that generates documentation, client SDKs, server stubs, and test harnesses from a single source of truth.
Where it shines:
- Contract-first development with code generation
- Extensive tooling ecosystem (Swagger UI, Editor, Codegen, Validator)
- Clear, versioned interface boundaries between services
- CI pipeline validation (request/response conformance)
Where it falls short for modern DevOps:
- Narrow scope — API surface only, not deployment, infrastructure, or operational specs
- No change lifecycle management — versioning the spec is manual, and there is no artifact trail from proposal to implementation
- Static contract format — OpenAPI 3.1 improved things with JSON Schema, but the spec describes an endpoint, not the engineering process behind it
- Agents cannot follow it — an AI agent can read an OpenAPI spec to understand an API surface, but it cannot use it to plan, propose, design, implement, and verify a change
# OpenAPI tells you WHAT the API looks like
# It does NOT tell you HOW to change it safely
paths:
/deployments:
get:
summary: List deployments
responses:
'200':
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/Deployment'
OpenAPI is essential infrastructure. But it is a contract format, not a change management framework. It solves the interface problem and leaves the engineering process untouched.
AsyncAPI: OpenAPI for Events
AsyncAPI extends the contract-first model to event-driven architectures — Kafka topics, RabbitMQ queues, WebSocket channels, MQTT brokers.
What it does well:
- Machine-readable channel definitions with publish/subscribe semantics
- Message schema validation across event boundaries
- Code generation for producers and consumers
- Growing tooling ecosystem (generator, modelina, CLI)
Same fundamental limitation as OpenAPI: AsyncAPI describes the event surface — not the process that produced it. You know what Kafka topics exist, but you have no artifact trail of why they were added, what alternatives were considered, or whether the implementation matches the intended design.
Both OpenAPI and AsyncAPI suffer from the same DevOps gap: the spec lives separately from the change process that created it, and the two inevitably diverge.
ADRs: Lightweight, Human-First, Agent-Hostile
Architecture Decision Records (ADRs) are a beautifully simple pattern: a short Markdown file per architecture decision, stored in the repository alongside the code. Originally proposed by Michael Nygard in 2011.
adr/
├── 001-use-postgresql-for-primary-storage.md
├── 002-adopt-kubernetes-for-orchestration.md
├── 003-use-redis-for-session-caching.md
└── 004-migrate-to-opensearch-for-logging.md
Each ADR follows a template: Context → Decision → Consequences. The format is intentionally minimal — prose-based, designed for humans to write and read.
Where ADRs excel:
- Low friction to create (one file, no tooling required)
- Lives in the repo alongside code (no external system drift)
- Perfect for recording why decisions were made
- Immutable, numbered record of architectural evolution
Where they break down:
- No structure an agent can follow — ADRs are free-form prose. An AI agent would need to parse natural language to extract decision criteria, alternatives, and scope. This is possible (LLMs are good at this) but not reliable for automated enforcement.
- No connection to implementation — an ADR says "we decided to use PostgreSQL." It does not track whether the implementation matches that decision, what tasks remain, or whether any code contradicts it.
- No lifecycle management — ADRs are append-only. Decisions are made and recorded, but there is no workflow for revisiting, superseding, or archiving them.
- No task generation — "We decided X" does not produce a checklist of implementation steps. A human (or agent) must interpret the decision and figure out what to build.
ADRs are valuable documentation. They are not a framework for driving change.
BDD / Gherkin: Executable Specs With a Ceiling
Behavior-Driven Development with Gherkin (Given/When/Then) is the closest predecessor to what OpenSpec attempts. It defines executable specifications that double as documentation and acceptance tests.
Feature: Deployment Rollback
Scenario: Rollback on health check failure
Given a deployment with 3 replicas
When the health check fails for 2 replicas
Then the orchestrator initiates a rollback
And the previous revision is restored within 60 seconds
BDD's genuine strengths:
- Executable specs — the spec IS the test. No documentation drift because the spec runs in CI.
- Shared language — Gherkin's structured natural language bridges business stakeholders and engineers.
- Living documentation — passing specs = accurate documentation.
- Agent-friendly format — the structured Given/When/Then format is parsable by AI agents without ambiguity.
Why BDD isn't enough for DevOps change management:
- Feature-level scope only — a Gherkin scenario describes one behavior. It cannot represent a multi-step engineering change (provision infrastructure → deploy service → configure monitoring → verify).
- No design artifact — BDD captures acceptance criteria but not the design decisions, trade-offs, or alternatives considered.
- No change lifecycle — BDD specs are written and automated, but there is no concept of proposal, approval, implementation tracking, or archive.
- High maintenance overhead — Gherkin scenarios are notoriously brittle. A UI change can break dozens of scenarios written in business language, requiring expensive rewrites.
BDD occupies a useful niche: spec-as-test for specific behaviors. It does not replace a change management framework.
The RFC Process: Collaborative Design, Manual Everything
The RFC (Request For Comments) process, popularized by the IETF and adopted by React, Rust, Python, and Kubernetes, is the gold standard for collaborative design. A proposed change is documented in a structured template, discussed by the community, refined through review cycles, and either accepted or rejected.
rfcs/
├── text/
│ ├── 0000-template.md
│ ├── 0001-new-rfc-process.md
│ ├── 0002-adopt-openspec.md
│ └── 0003-spec-first-change-management.md
What makes RFCs powerful:
- Structured, repeatable proposal format
- Community review baked into the process
- Decision record (accepted/rejected with rationale)
- Historical archive of design evolution
The DevOps gap with RFCs:
- No automation boundary — an RFC is a Markdown document with prose sections. There is no machine-enforceable contract between the proposal and the implementation.
- No task decomposition — acceptance means "we agreed this is the right approach." Someone still needs to break it into implementation tasks.
- No CI integration — CI cannot verify that implementation matches the RFC's design decisions.
- Agent-hostile structure — like ADRs, RFCs rely on human readers to interpret intent. An AI agent parsing a 50-section RFC would struggle to extract deterministic action items.
The RFC process produces excellent design artifacts. But the bridge from "RFC accepted" to "code deployed" is entirely manual.
GitHub Issues + PRs: The Unstructured Default
Most DevOps teams default to GitHub Issues for change tracking and Pull Requests for code review. This is the path of least resistance — it ships quickly and requires no additional tooling.
What it gets right:
- Zero setup cost
- Everyone knows how to use it
- PR reviews provide human quality gates
- CI integration is built in
What it gets wrong for systematic change management:
- Issue = unstructured blob — an issue can be a bug report, a feature request, a question, a design discussion, or a task. There is no structural distinction.
- No design artifact — the PR is the implementation. Design decisions are buried in comment threads or left implicit.
- No spec → implementation traceability — did the PR implement what was intended? The only way to know is to read the diff and compare it to your memory of the discussion.
- Agent-hostile — an AI agent can read issues and PRs, but it cannot follow a structured workflow from proposal to implementation to verification because the structure simply does not exist.
- Documentation drift is the default — once merged, the issue is closed and the discussion is archived. The implementation evolves, but the issue never updates.
GitHub Issues + PRs is a communication platform that teams repurpose into a workflow. It works for small teams with good discipline. It scales poorly.
OpenSpec: AI-Native Change Management
OpenSpec enters this landscape as a framework designed explicitly for the AI agent era. It combines the structured artifact approach of RFCs, the traceability of ADRs, the executability of BDD, and adds something none of these have: a machine-enforceable change lifecycle that AI agents can follow autonomously.
The OpenSpec Change Lifecycle
Every change in OpenSpec follows a defined artifact dependency chain:
┌─────────────────────────────────────────────────────────────────┐
│ OPENSPEC CHANGE LIFECYCLE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. PROPOSAL │
│ ┌──────────────────┐ │
│ │ What & Why │ Problem statement, scope, success │
│ │ proposal.md │ criteria, risks │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 2. DESIGN │
│ ┌──────────────────┐ │
│ │ How │ Architecture, trade-offs, component │
│ │ design.md │ interactions, data flow │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 3. SPECS (capability specs) │
│ ┌──────────────────┐ │
│ │ Requirements │ Machine-readable requirements with │
│ │ specs/*/spec.md │ scenarios (Given/When/Then-like) │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 4. TASKS │
│ ┌──────────────────┐ │
│ │ Implementation │ Atomic, ordered implementation steps │
│ │ tasks.md │ with checkboxes │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 5. IMPLEMENTATION (via /opsx-apply) │
│ ┌──────────────────┐ │
│ │ Code + Tests │ AI agent executes tasks, marks │
│ │ │ checkboxes as completed │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 6. DELTA SPECS │
│ ┌──────────────────┐ │
│ │ What changed │ Diff: added/modified/removed specs │
│ │ specs/*/spec.md │ for syncing back to main specs │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 7. ARCHIVE │
│ ┌──────────────────┐ │
│ │ Done │ Change moved to archive with date │
│ │ archive/ │ Main specs updated with delta │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
What Makes OpenSpec Different
1. AI agent as first-class consumer
Every artifact in an OpenSpec change has a defined schema, known output paths, and explicit dependencies. The CLI exposes machine-readable instructions (openspec instructions <artifact-id> --json) that tell an AI agent exactly what to create, what template to follow, and what context to read.
This is fundamentally different from a Markdown RFC or an ADR. The agent does not need to infer what a "good" proposal looks like — the schema defines it. The agent does not need to figure out which files to create — the CLI resolves the paths. The agent does not need to guess what artifacts are needed next — the dependency graph says so.
# Agent asks: "What do I need to create next?"
openspec instructions design --change "add-metrics-pipeline" --json
# Response: structured, actionable, unambiguous
{
"artifact": "design",
"template": "...",
"dependencies": ["proposal"],
"resolvedOutputPath": ".openspec/changes/add-metrics-pipeline/design.md",
"context": "..."
}
2. Full change lifecycle, not a fragment
OpenAPI gives you the API contract. ADRs give you the decision log. BDD gives you the acceptance tests. OpenSpec gives you the entire chain from proposal to archive, with each artifact's outputs feeding into the next.
This means an AI agent can:
- Read the proposal → understand scope
- Read the design → understand architecture
- Read the specs → understand requirements
- Read the tasks → know what to implement
- Mark tasks complete → update the artifact
- Generate delta specs → document what changed
- Archive → clean up
No other framework provides this full lifecycle in a machine-enforceable format.
3. Delta specs as a primitive
When implementation reveals that the spec was wrong or incomplete, OpenSpec captures the delta — what actually changed versus what was planned. These delta specs can be synced back to the main specification, keeping the spec alive rather than letting it fossilize.
This solves the documentation drift problem that plagues every other framework. The spec does not sit on a shelf — it evolves with the implementation, and the deltas provide an audit trail of every divergence.
4. Archive is part of the workflow, not an afterthought
Every completed change is moved to an archive directory with a date prefix. The change becomes a historical record, not a forgotten directory. This matters for compliance, post-mortems, and training AI agents on past patterns.
The Trade-offs
OpenSpec is not a replacement for every specification tool. It has real trade-offs:
| Dimension | OpenSpec | Traditional Approaches |
|---|---|---|
| Setup overhead | Requires CLI, schema initialization | ADR: one file. Issues: zero setup |
| Learning curve | Artifact lifecycle must be learned | Everyone knows how to write Markdown |
| Tooling maturity | Emerging ecosystem | OpenAPI: decade+ of tooling |
| Human readability | Structured artifacts, less narrative | RFCs: natural prose, easy to read |
| Scope | Engineering change management | OpenAPI: API contracts only |
| Team size suitability | Best with AI agents or structured teams | ADRs: works for 2-person teams too |
OpenSpec adds structure overhead. For a solo developer fixing a typo, a GitHub issue is more appropriate. For a multi-step infrastructure change involving provisioning, deployment, configuration, and verification — especially when AI agents are executing the work — the structure is not overhead, it's leverage.
Head-to-Head: A Deploy Scenario
To make the comparison concrete, consider a realistic DevOps scenario: adding a Prometheus metrics pipeline with custom application metrics to a production service.
| Phase | OpenAPI | ADR | BDD | RFC | Issues/PRs | OpenSpec |
|---|---|---|---|---|---|---|
| Proposal | N/A | N/A | N/A | RFC #0032 | Issue "add metrics" | proposal.md |
| Design | N/A | ADR-005 "use Prometheus" | N/A | Included in RFC | PR description | design.md |
| Specs | /metrics endpoint def | N/A | Given/When/Then scenarios | N/A | N/A | specs/metrics/spec.md |
| Tasks | N/A | N/A | N/A | N/A | Issue checklist | tasks.md |
| Implement | Manual | Manual | Manual | Manual | PR | /opsx-apply |
| Verify | Schema valid? | N/A | Cucumber pass? | N/A | CI checks | Agent checks + CI |
| Trace | N/A (no link) | N/A (no link) | N/A (separate) | N/A (separate) | Issue ↔ PR link | Full artifact chain |
| Archive | N/A | N/A | N/A | Closed RFC | Closed issue | Timestamped archive |
In the OpenSpec workflow, a single agent can traverse the entire lifecycle. In every other approach, there are manual handoffs, information loss between phases, and no automated verification that the implementation matches the intent.
When to Use What
The right tool depends on who consumes the specification and what you need it to enforce.
CONSUMER
│
▼ ┌─────────────────────────────────┐
AI Agent │ OpenSpec │
(autonomous) │ Full lifecycle, agent-native │
└─────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ BDD / Gherkin │
│ Executable specs for behavior │
└─────────────────────────────────────────────────────────┘
│
┌────────────────────┐ ┌───────────────────┐ ┌──────────────┐
│ OpenAPI / AsyncAPI │ │ RFC Process │ │ ADRs │
│ Contract formats │ │ Collaborative │ │ Decision │
│ with codegen │ │ design reviews │ │ log │
└────────────────────┘ └───────────────────┘ └──────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ GitHub Issues / PRs / JIRA │
│ General-purpose tracking, unstructured │
└─────────────────────────────────────────────────────────┘
│
Human ▲
(ad hoc) │
ENFORCEMENT (machine-readable → verifiable)
Use OpenSpec when:
- AI agents are executing implementation work alongside humans
- Changes span multiple steps across infrastructure and application code
- You need traceability from proposal through archive
- Documentation drift is costing you debugging time
- Compliance requires an audit trail of what was changed and why
Use OpenAPI/AsyncAPI when:
- You need API contract validation and code generation
- Your primary concern is interface compatibility between services
- You have a service mesh or API gateway that consumes the spec directly
Use ADRs when:
- You want a lightweight architecture decision log
- No AI agents are involved
- You trust humans to keep the repo description accurate
Use BDD/Gherkin when:
- You need executable specifications that run in CI
- Business stakeholders need to read acceptance criteria
- The spec boundary is a single feature or behavior
Use RFCs when:
- You need broad community input on a design
- The change has long-term architectural impact
- Discussion quality matters more than automation
Use Issues/PRs when:
- The change is trivial (typo, single-file fix)
- You have no AI agents and a small team
- Structure overhead would slow you down more than it helps
The DevOps Verdict
The specification frameworks most DevOps teams use today were designed for a world where humans write code, humans review code, and humans deploy code. That world is ending.
When AI agents participate in the engineering lifecycle — proposing changes, writing implementation code, verifying requirements, generating tests — the specification framework becomes the control plane for agent behavior. A free-form RFC or an unstructured GitHub issue gives an agent ambiguous instructions. A structured OpenSpec artifact with a defined schema, resolved output paths, and explicit dependencies gives an agent deterministic guidance.
This does not mean OpenSpec replaces every other tool. This project uses OpenAPI for its Stripe integration contract, ADRs for architecture decisions, and OpenSpec for managing changes. They serve different layers of the specification stack.
But for the change management layer — the part of the workflow that connects "we should do this" to "it is deployed and verified" — the existing tools leave a gap that AI-native agents cannot cross without human hand-holding. OpenSpec fills that gap by making the entire lifecycle machine-enforceable.
The frameworks that win in the DevOps era will not be the ones with the most features or the prettiest documentation generators. They will be the ones that AI agents can follow without human interpretation.
This article is part of the DevOps Infrastructure series on tobias-weiss.org. The OpenSpec framework is developed as part of the OpenCode project and is used to manage all changes on this site.