Everybody agrees: AI makes code cheap. The question is: how do you get the AI to produce working and tested software?
In the past months I have worked intensively with coding agents. My favorite agent so far is Claude Code, and since the release of Opus 4.6 it has become my go-to tool for writing software.
Here’s what I’ve learned:
- The bottleneck is no longer writing code.
- It’s telling the agent what to build. Precisely enough that the result actually works.
Today, most people deal with this in one of two ways:
- Vibe coding — just prompt and iterate. It’s fast for prototypes, but it doesn’t scale.
- Copy-paste prompting — collect snippets and prompt templates, paste them in before each task. Better than vibe coding, but it doesn’t build a lasting knowledge base.
What both approaches are missing is a system.
What’s missing is a system#
I’ve written about this on the blog: spec-driven development as a methodology, why vibe coding doesn’t scale, and how to write effective specs. For me, spec-driven development is the missing system for coding agents.
There are tools out there — OpenSpec, BMAD, SpecKit, etc. but none of them gave me:
- Integration tests as proof the software actually works
- No lock-in to a single language like Python
- Smart enough to only load relevant specs
- A system that asks me instead of making assumptions
- A system that works also in brownfield projects
Introducing speq-skill#
So I built speq-skill. It’s a free Claude Code plugin with a lightweight CLI and skills.
The core idea: your specs live permanently in your project. The CLI adds semantic search so the coding agent finds only the relevant specs to avoid cluttering the context window. The skills add guardrails for code quality and enforced integration tests and TDD.
The workflow#
speq-skill follows a four-phase cycle: mission, plan, implement, record.
Mission#
Every project starts with a mission file. The agent interviews you about the project’s purpose, target users, tech stack, architecture, and constraints, then generates specs/mission.md. This is a one-time setup that gives every future session the context it needs.
Plan#
When you’re ready to build a feature, run /speq:plan. The agent searches the existing spec library for related features semantically, asks you clarifying questions, and produces an implementation plan and spec deltas for your feature. The spec deltas are markdown files with requirements written as BDD-style scenarios using RFC 2119 keywords. Each plan lives in a staging area specs/_plans/ until implementation is complete.
This is the phase where intent gets clarified. The agent is instructed to conduct a clarifying interview with you so that gaps or vague elements of your prompt are discussed.
Implement#
Run /speq:implement <plan-name> to instruct the coding agent to orchestrate the implementation of a plan. The coding agent will spawn sub-agents that not only generate the code but also make sure that the planned feature are working. The verification is done with enforced integration tests and the rule to obtain factual evidence for working software instead of claiming success. During the implementation the agent follows quality guardrails for code style and testing.
Record#
After implementation, /speq:record merges the spec delta into the permanent library in specs/. Your specs accumulate over time, forming a growing knowledge base of what the software does and why.
The spec library#
Specs live in your project as plain markdown files, organized by domain and feature:
specs/
mission.md
_plans/ # staging area for in-progress work
_recorded/ # completed plans, kept for reference
auth/
login/
spec.md
signup/
spec.md
billing/
invoices/
spec.md
Two directories have special roles. _plans/ is the staging area where plans with spec deltas live while a feature is being planned and implemented. _recorded/ is where completed plans are moved after the agent merges their deltas into the permanent library. You can review recorded plans to trace how your project evolved.
Each spec uses BDD-style Given/When/Then scenarios with RFC 2119 keywords (SHALL, SHOULD, MAY) to express requirements at the right level of precision. For example:
### Scenario: expired token
- Given a user with an expired authentication token
- When the user requests a protected resource
- Then the system SHALL return a 401 status code
- And the system SHALL include a `WWW-Authenticate` header
See writing specs for AI coding agents for the full guide on this format.
The library grows over time as plans get recorded. After a few features, you have a searchable knowledge base of what the software does and why.
The CLI#
The speq CLI is the backbone that the agent calls during planning and implementation. It provides two core capabilities:
- Semantic search (
speq search) — indexes the spec library with Snowflake Arctic Embed, a compact embeddings model (~23 MB). The agent queries for relevant specs instead of loading everything, keeping the context window focused on what matters for the current task. - Structure validation (
speq feature validate,speq plan validate) — enforces that specs follow the required BDD/RFC 2119 format. The agent runs validation after writing specs to catch structural issues early.
The CLI also offers speq domain list and speq feature list for navigating the spec library.
Bundled skills#
speq-skill bundles three core skills alongside the workflow:
- Code navigation via Serena, for semantic code exploration
- External documentation via Context7, for pulling in up-to-date library docs
- Code quality guardrails to enforce clean code and Test-Driven Development
Serena and Context7 are two of the most popular MCP servers in the Claude Code ecosystem. The skills teach the agent when and how to use them effectively, so you get the benefits of both tools without having to prompt for them manually.
Get started#
speq-skill is open source and free to use. Installation instructions and full documentation are on GitHub.
If you’re new to spec-driven development, the three-part blog series provides the foundation: