What Ship Does
Ship takes a GitHub Issue (your spec) and assembles a team of agents to implement it, verify it independently, and open a Pull Request. One skill, full pipeline.
/mill:ship 47
The Team
| Role | Count | Purpose |
|---|---|---|
| Lead | 1 | Reads spec, sizes team, delegates work, manages cross-team contracts |
| Implementer(s) | 1-4 | Build within assigned file ownership boundaries |
| Verifier | 1 | Separate agent, clean context, reviews full changeset against every criterion |
Even a simple spec gets a team-of-1. The lead always delegates — it never implements directly.
Team size is driven by the spec’s approach:
| Approach Parts | Domain | Implementers |
|---|---|---|
| 1-4 | Single domain | 1 |
| 5-9 | Single domain | 2 (split by concern) |
| Any | Fullstack | 2-3 (one per layer) |
| 10+ | Any | 3-4 |
How It Works
Phase 1: Setup
- Launch — mill reads the issue, parses the spec structure, validates everything needed is present
- Isolate — creates a worktree on a dedicated branch (
issue-47). Your main branch stays untouched - Load context — the spec, project context from
.mill/context.md, ground knowledge, and domain guidance
Phase 2: Implement
- Delegate — each implementer gets their portion of the spec, project context, domain guidance, and explicit file ownership:
Backend implementer: src/api/, src/models/, src/services/
Frontend implementer: src/components/, src/pages/, src/hooks/
File ownership prevents conflicts when multiple implementers work in parallel. The lead manages cross-team contracts — if the backend implementer defines an API shape, the lead communicates that contract to the frontend implementer.
- Build — implementers work within their boundaries, committing as they go
- Polish — after completion, each implementer gets one bounded pass to clean up rough edges, check every spec criterion against their diff, and run tests. This raises the floor before the verifier sees the code.
Phase 3: Verify
- Independent verification — the lead spawns a verifier with clean context: no access to the implementer’s reasoning, planning, or iteration history. The verifier sees only the spec and the resulting code diff. It reviews the full changeset, runs the test suite, checks every acceptance criterion, and looks for quality issues, security concerns, and scope creep.
What “clean context” means — The verifier is a separate agent that starts fresh. It doesn’t know what was hard to implement, what trade-offs the implementer considered, or what was tried and reverted. This is deliberate — it catches what self-review misses.
- Iteration cycles — when the verifier rejects:
- Verifier reports specific blockers with suggestions
- Lead identifies which implementer owns the affected code
- That implementer gets the feedback plus cumulative history of past fixes
- Verifier re-checks with clean context (no memory of previous rounds)
Rule — Iteration cycles are governed by the spec’s Loop Contract. The default is 5 cycles. After that, mill escalates to you with options: continue iterating, create PR as-is, or abort.
Phase 4: Finalize
- Create PR — mill creates a Pull Request linking back to the spec issue. Full traceability from intent to implementation
- Extract learnings — the lead reviews the session — iteration history, implementer notes, orchestration decisions — and writes a learnings observation:
---
source: ship
type: learning
issue: 47
suggested: ground/patterns/
---
# Ship #47 Learnings
- ReportService and PdfRenderer are tightly coupled — changes to one
require changes to the other
- Puppeteer needs `--no-sandbox` flag in the CI Docker container
- The PDF rendering timeout should be configurable per-report, not global
Learnings are auto-tagged with a suggested: routing hint. You decide what becomes permanent knowledge during /mill:ground review.
- Clean up — worktree removed. Git log and GitHub PRs are the history.
When Things Go Wrong
Tests fail — Implementers address test failures as part of their task, committing only after tests pass.
Verification rejects — The lead routes feedback to the responsible implementer. Iteration continues up to the Loop Contract limit.
Spec has gaps — The lead escalates with a clear description of what’s missing. The spec goes back to drafting.
Fallback mode — If agent teams aren’t available (experimental feature disabled), ship falls back to single-session mode: the lead implements directly, then does an explicit self-review phase against the spec. Iteration limit still governed by the Loop Contract. Degraded but functional.