AI-Native Product Engineering Playbook

TL;DR: I built a complete system for developing software products with AI agents. Ten specialized agents, automated quality gates, a self-improving knowledge base. I used it to ship a full-stack product in one day with zero hand-written application code. This page explains the thinking. The full playbook lives on its own page.

The Problem

Traditional development artifacts (PRDs, TDDs, sprint plans) were designed for human teams. They optimize for stakeholder alignment, meeting-based communication, and handoffs between specialized roles.

AI agents don't need persuasive narratives or organizational context. They need focused documentation with clear boundaries: what to read, when to read it, what to do with it.

I kept hitting the same friction building products with AI. Context windows cluttered with irrelevant information. Agents repeating mistakes that were solved two sessions ago. Code reviews that missed obvious issues because the reviewer shared the author's biases.

Each problem was solvable individually. I wanted a system that solved them structurally.

Five Principles

Separation of Concerns. Each document answers one question. WHAT are we building (product spec). WHY this approach (build strategy). HOW to implement (conventions and domain skills). No document tries to answer all three.

Context Hygiene. Agents perform best with minimal, relevant context. The system uses progressive disclosure: skills show a one-line description at startup (~100 tokens each) and load full content only on invocation. Ten skills don't cost ten times the context.

Compound Engineering. Every session leaves the codebase smarter. New patterns get routed to conventions. Domain-specific learnings go to specialized skill files. Failed approaches become documented anti-patterns. Agents tomorrow inherit what agents today discovered.

Enforced Quality. Manual checklists get skipped. Automated pipelines don't. A quality gate hook evaluates each session and blocks completion if any workflow phase was missed.

Progressive Complexity. Quality investment scales with risk. A prototype doesn't need the rigor of a payment flow. Test steps can be conditionally skipped. Specialist reviewers are opt-in. The system adapts to what a task actually warrants.

Agent Architecture

Six core agents and four specialist reviewers. Each runs in its own isolated context window.

Context isolation is the central design decision. A code reviewer that didn't write the implementation catches more issues. A test writer that only sees the interface writes more honest tests. A documentation verifier with no build context reads docs the way a newcomer would.

The core agents handle building, testing, code review, documentation verification, documentation writing, and refactoring analysis. The specialists focus on security, data integrity, architecture, and cross-file pattern consistency. Specialists are opt-in, invoked when a task warrants deeper scrutiny in a specific domain.

The Pipeline

The full pipeline chains agents through a complete build cycle:

Build. The builder reads project conventions, implements the feature, and returns a list of every file it touched.
Test. The test writer receives that file list, writes tests, and runs them. Skipped with a stated reason for scaffolding and configuration work.
Review. The code reviewer receives the same file list and produces a structured document: Critical, Warning, and Suggestion findings.
Fix. The main context addresses all Critical and Warning items.
Verify. A documentation verifier confirms that project docs still match reality.
Compound. The system captures new patterns, updates conventions, and logs decisions.
Gate. A quality hook confirms all phases completed before the session can end.

Each downstream agent receives the exact file list from upstream. No guessing about scope.

For lower-risk work, discrete commands invoke individual phases. You compose the process a task needs.

Compound Engineering

The term and core concept of "compound engineering" come from Kieran Klaassen at Every Inc. My implementation is my own, but it builds on their thinking.

This is the concept I'm most convinced matters long-term.

Most AI-assisted development is stateless. You solve a problem in one session. Two sessions later, a different agent hits the same problem and solves it a different way. Patterns drift. Mistakes repeat. The codebase grows without getting smarter.

Compound engineering makes knowledge capture structural, not optional. After every session, the system asks: did a new pattern emerge? Did an existing one prove wrong? Was an approach tried that failed?

Global patterns go to a conventions file. Domain-specific learnings go to skill files (frontend, database, testing, API design). Architectural decisions go to a log with context and rationale.

Six sessions into a build, the project's documentation reflects what actually happened. Conventions contain battle-tested patterns. Anti-patterns document real failures. Agents read all of it before starting work.

Proof

I used this playbook to build Quorum, a collaborative ranking application. Full stack. One day. Zero hand-written application code. Six agents running through quality gates at every stage.

I made every product decision, defined every feature, chose the architecture, reviewed every output. The playbook handled orchestration and enforcement. I worked at the level of intent and product judgment while agents handled implementation.

The playbook is on its fifth major version. Each iteration incorporated lessons from actual product builds. The current system (skill architecture, progressive disclosure, domain-specific knowledge, quality gates) is the result of keeping what worked and cutting what didn't.

The Full Playbook

The complete document is 2,500+ lines: every agent definition, every skill template, the task management system, git strategies for parallel work, quality gate configuration, workflow guides, and troubleshooting.

The principles are tool-agnostic. Separation of concerns, context hygiene, compound learning, and enforced quality apply regardless of which AI tools you use. The implementation targets Claude Code, but the system design transfers.

Read the full playbook →