Spec — mill

The Bar

A specification that requires clarifying questions has failed.

This single test determines whether a spec is ready. The bar is high because the payoff is high: a spec that passes this test can be implemented without back-and-forth, without “what did you mean by…?”, without rework.

The Structure

Every mill spec has four sections that form a provable chain:

Requirements (R)

What the solution must achieve. Each requirement gets an ID, a description, and a priority:

Priority	Meaning
core	Must be implemented. No ship without it.
must-have	Required but could be deferred to a follow-up.
nice-to-have	Valuable but not blocking.
out	Explicitly excluded. Saying “no” is a decision too.

Approach (A)

How you’ll build it. Broken into parts, each describing a specific mechanism — a model change, an API endpoint, a UI component, a migration script.

Approach parts can be flagged with a warning when uncertainty exists. These flags must be resolved before the spec is ready.

Criteria (C)

Testable conditions that prove each requirement is met. Good criteria are:

Binary — they pass or fail, no “kind of”
Automated — they can be verified by a test command
Independent — each criterion tests one thing

Coverage (R x A x C)

The proof matrix. Every core and must-have requirement must have at least one approach part implementing it and at least one criterion verifying it.

A Complete Example

Here’s a small spec for adding PDF export to a reporting system:

Requirements:

ID	Description	Priority
R1	Users can export any saved report as a PDF file	core
R2	The PDF preserves the report’s table formatting and charts	core
R3	Export works for reports up to 500 rows without timeout	must-have

Approach:

ID	Mechanism
A1	Add `GET /api/reports/:id/pdf` endpoint that accepts `format=pdf` query param
A2	Use Puppeteer to render the report HTML template server-side and print to PDF
A3	Stream the PDF response with `Content-Disposition: attachment` header
A4	Add a 30-second timeout; return 504 if rendering exceeds it

Criteria:

ID	Condition
C1	`GET /api/reports/1/pdf` returns a valid PDF with status 200 and `Content-Type: application/pdf`
C2	The PDF contains all table rows and chart images from the source report
C3	A 500-row report completes export in under 25 seconds
C4	A 1000-row report returns 504 after 30 seconds

Coverage:

	A1	A2	A3	A4
R1	x	x	x
R2		x
R3				x

	C1	C2	C3	C4
R1	x
R2		x
R3			x	x

Every requirement has approach parts implementing it. Every requirement has criteria verifying it. No gaps.

The Workflow

Starting Fresh

/mill:spec "Add PDF export for monthly reports"

mill checks your project context, loads ground knowledge, and begins the conversation. If observations are pending in the learning inbox, mill nudges you — a reminder to run /mill:ground before drafting so your spec benefits from the latest learnings.

The Conversation

mill asks one question at a time, using structured options where sensible:

What type of change? Feature / Bug / Task / Security
What domain? Backend / Application / Website / Platform / Full-stack
What’s the core problem? (your words)
What must the solution achieve? (requirements emerge)
How should we build it? (approach forms)
How do we prove it works? (criteria crystallize)

A draft is saved early and updated as you go. You can stop mid-conversation and resume later.

Validation

Before publishing, mill validates your spec against these checks:

Self-Containment — no statement requires external clarification
Language Independence — describes what and why, not language-specific how
Decision Completeness — no TBDs, all parameters bound
Explicit Non-Applicability — omitted sections marked N/A with reason
Coverage — all core/must-have requirements have approach and criteria
No open flags — all uncertainties resolved

The first three deserve examples — they’re the ones most often violated.

Self-containment means every statement can be executed by someone unfamiliar with the system:

Fails	Passes
”Configure the stream endpoint"	"Set `RTMP_INGEST=rtmp://ingest.example.com:1935/live` in `encoder/.env`"
"Use the appropriate codec"	"Encode with H.264 Main Profile, 1080p@30fps, 4500kbps CBR"
"Handle errors gracefully"	"On failure: retry 3x with exponential backoff, then emit `stream.failed` with payload `{streamId, error, timestamp}`”

If a reader has to ask “which endpoint?” or “what codec?” — the spec isn’t ready.

Language independence means “validate the email format” (what), not “use a regex to validate the email” (how). The implementer chooses the mechanism based on the stack.

Decision completeness means every parameter is bound to a concrete value. When a decision depends on a condition:

Retry count: default 3 unless network is cellular → 5
Cache TTL: default 60s unless user is admin → 0 (no cache)

No TBDs. No “it depends.” The implementer never has to guess.

Publishing

When validation passes, mill creates a GitHub Issue. The issue becomes the canonical spec. The local draft is deleted. One source of truth.

Domain Awareness

Specs include a domain that shapes execution guidance:

Domain	mill Pays Attention To
backend	API design, error handling, data modeling, performance
application	Component architecture, state management, UX patterns
website	Page architecture, responsive design, Core Web Vitals
platform	Infrastructure-as-code, containers, observability
fullstack	Combined guidance from all relevant layers

Why domain matters — A backend spec loads API design patterns. A website spec loads Core Web Vitals guidance. The domain shapes what the verifier checks and what implementation patterns ship follows.

The Loop Contract

Every spec ends with a loop contract that governs how ship iterates:

## Loop Contract
**Test Command:** `npm test`
**Max Iterations:** 5
**Verification Commands:** `npm run lint`

Field	Purpose	Default
Test Command	Must pass before PR	Detected from project
Max Iterations	Maximum implement-verify cycles before escalating	5
Verification Commands	Additional checks run by implementer and verifier	None

After Max Iterations cycles, ship escalates to you with options: continue iterating, create PR as-is, or abort.

Observations During Drafting

As you draft, mill watches for gaps in ground knowledge:

A persona referenced but not defined
A domain term used without a vocabulary entry
A requirement that conflicts with a documented rule
An entity implied but not in the schema

High-confidence gaps get written as observations automatically. Uncertain ones are collected and shown at the end for your review. Drafting a spec doesn’t just produce a spec — it enriches the knowledge base.

Common Patterns

Bug Specs

Bug specs are surgical. They focus on reproduction and verification:

R1: The checkout total no longer double-counts tax on discounted items.

Approach: Fix the tax calculation in OrderService.calculateTotal() to apply
tax after discount, not before.

C1: Given a $100 item with 10% discount and 8% tax, total is $97.20
    (not $98.00).

Feature Specs

The most common type. Full R→A→C chain with coverage proving completeness. The example above is a feature spec.

Task Specs

Refactoring, migration, cleanup. Requirements are often simpler (“migrate from X to Y without regression”), but approach and before/after criteria are detailed.

Security Specs

Security always wins classification. If a change has security implications, it’s a security spec regardless of what else it does. These include threat model, attack vectors, and security-specific criteria.