diff --git a/docs/plans/why-the-factory-is-different.md b/docs/plans/why-the-factory-is-different.md new file mode 100644 index 00000000..13e7a9d8 --- /dev/null +++ b/docs/plans/why-the-factory-is-different.md @@ -0,0 +1,225 @@ +# Why this software factory is different + +**Audience:** Anyone evaluating AI tooling for their engineering +work. Company engineering leadership. OSS project maintainers. +Individual contributors shipping on evenings. The same argument +scales across audience size. + +**Short version:** Most AI coding tools are **assistants** that +help a human developer faster. This is a **factory** that owns +the whole coding + devops pipeline end-to-end with measurable +quality and DORA discipline. Those are different categories of +thing. + +--- + +## What people typically know about AI in engineering (the common priors) + +- **"AI helps developers write code faster."** True for IDE + assistants (Copilot, Cursor, Tabnine). They accelerate human + typing and completion. +- **"AI still needs human review."** True for most tooling — + the developer reads the suggestion, accepts / modifies / + rejects, owns the commit. +- **"AI can't safely own production changes."** Commonly held + belief. The reasoning is usually: *"deploying to a live + production system with zero downtime requires judgment, + context, and institutional memory humans have and AI does + not."* +- **"Full autonomy is for sandboxed toys."** The working + assumption that autonomous agents play in a safe playground + while real work stays human-gated. + +Each of these is defensible when applied to typical AI tooling +in 2026. None of them is defensible when applied to this +factory. Here's why. + +--- + +## What this factory actually does (refuting each prior) + +### "AI helps developers write code faster" → the factory IS the developer + +- Ownership is not *"suggest a line, dev accepts"* — it is + *"the agent lands the commit, tests pass, reviewers sign off, + PR merges."* +- Specialist reviewers (harsh-critic, spec-zealot, perf + engineer, threat-model-critic, public-API designer, and + more) are composed into every change that touches their + domain. +- Formal verification (TLA+, Z3, FsCheck, Stryker, Lean) is + wired into the CI substrate. Claims the code makes about + its behaviour are checked against specs, not just unit + tests. + +The human is not bypassed — humans are in the loop as +*maintainers*: scope, priority, strategic direction, +ratification of structural changes. They are not in the loop +as *bottleneck reviewers*. The factory removes the "needs +Aaron's eyes on every PR" failure mode without removing +Aaron's agency. + +### "AI still needs human review" → the factory IS the review + +Review is not an event the factory asks for. Review is a +property of every commit. The reviewers are named, their +scopes are scoped, their rules are cited with stable rule-IDs +(BP-01..BP-NN). When a reviewer flags an issue, it produces a +rule-ID-citation and the fix path. + +The human opens a PR description, clicks Merge, and the +quality floor is already held. + +### "AI can't safely own production changes" → it's the opposite + +Humans are *not actually great* at zero-downtime production +changes. What makes humans safe on production is **process +discipline**: + +- Peer review. +- Staged rollouts / canaries. +- Runbooks for known failure modes. +- Post-mortems that feed back into future work. +- Change windows and deployment gates. + +These are process, not human insight. The factory follows +(and *enforces*) the same process, but without the human +failure modes: + +- Review never gets skipped because the reviewer was on + vacation. +- Canaries are always evaluated against explicit rule-IDs, not + "it looked fine for a few minutes." +- Post-mortems file lessons into durable memory that future + work *actually consults* — not a document everyone read + once and forgot. +- Change windows and gates are configuration, not norms + hoping to hold. + +Net effect: the factory's DORA metrics (deployment frequency, +lead time for changes, change failure rate, MTTR) can be held +at or better than human-only teams. Not because the factory is +smarter than the humans — because it's more disciplined about +the parts humans struggle to sustain: continuous rigor, +memory permanence, and lesson integration. + +### "Full autonomy is for sandboxed toys" → the factory is production-posture by default + +- Every commit is measured against the live-lock smell audit + (ratio of product motion vs process motion; see + `tools/audit/live-lock-audit.sh`). +- Every lesson learned from a failure mode is filed into + `docs/hygiene-history/*.md` for future consultation. +- Alignment is an observable — Zeta's primary research + contribution is **measurable AI alignment** (see + `docs/ALIGNMENT.md`). The factory builds its own work on + the discipline it's researching. +- Retraction-native change substrate: rollback is first-class + algebra, not a crisis response. Any delta has a clean inverse. + +--- + +## Why this helps adopting teams forward their objectives + +### For a company + +- **Engineering velocity unbounded by senior-reviewer + capacity.** Juniors ship to the factory's quality floor + without waiting for a senior's attention. +- **Deployment frequency up** — the factory does not sleep, + vacation, or shuffle priorities. +- **Change failure rate down** — the reviewer panel and + formal-verification gate catch what humans often miss under + schedule pressure. +- **MTTR bounded** — retraction-native algebra means + rollbacks are surgical, not re-deploys. +- **Incident lessons persist** — the factory remembers what + the team forgot. + +### For an OSS project + +- **Maintainer burden drops** — the factory does the rote + review and discipline work maintainers typically absorb + unpaid. +- **Contributor experience improves** — PRs get quality + feedback quickly with rule-IDs, not "looks fine, merging + when I get around to it." +- **Project survives maintainer turnover** — durable memory + + governance substrate means the project's institutional + knowledge doesn't live in one person's head. + +### For an individual contributor + +- **Shipping on evenings becomes reliable** — the factory's + review + verification gate catches the kinds of bugs you'd + otherwise find in production Monday morning. +- **Generalist becomes specialist-aware** — each agent is a + specialist in its scope; you inherit that specialist + knowledge without hiring it. +- **You keep moving when you're tired** — the factory's + discipline is deterministic; yours is not after hour 3. + +--- + +## What this factory is NOT + +- **Not a product.** This repo is open-source and research- + driven. The factory is a methodology + substrate. Adopting + projects take the substrate and run it on their own work. +- **Not a replacement for human judgment on what to build.** + Scope, priority, strategic direction, and ratification of + structural changes stay human. The factory ships what the + human directs. +- **Not a claim that AI is strictly better than humans.** The + factory is better at *sustained rigor* and *memory + permanence*. Humans are still better at novel-problem + synthesis, stakeholder relationships, and strategic + vision. The factory augments, not replaces. +- **Not a promise that adoption is zero-friction.** Adopting + a software factory is a change for any team. The factory + earns its keep over weeks to months, not hours. + +--- + +## How to evaluate adoption + +Three concrete signals to watch in the first weeks: + +1. **DORA four-key trend.** Deployment frequency, lead time, + change failure rate, MTTR — compare a 4-week window before + adoption to a 4-week window after. The factory should + improve at least three of the four. +2. **Live-lock smell ratio.** Is the factory's output skewed + toward process-churn without product motion? Run + `tools/audit/live-lock-audit.sh 25` on your `main`. EXT + ratio < 20% is a smell firing. +3. **Lesson-integration cadence.** Are lessons from failures + landing in durable memory and actually getting consulted? + Grep the hygiene-history files for "Lessons integrated" + sections after each incident. + +If all three are healthy, adoption is paying off. If any is +unhealthy, the factory has a bug in your configuration — file +it as a BACKLOG item, consult the memory substrate, fix it. + +--- + +## Further reading + +- `README.md` — what Zeta the library is +- `AGENTS.md` — the universal onboarding handbook for agents + working in this factory +- `CLAUDE.md` — Claude Code-specific bootstrap +- `GOVERNANCE.md` — numbered repo-wide rules +- `docs/ALIGNMENT.md` — the alignment contract +- `docs/ARCHITECTURE.md` — how the pieces fit +- `docs/plans/factory-demo-scope.md` — the concrete factory- + demo scope + build sequence (if present — currently on a + feature branch) +- `tools/audit/live-lock-audit.sh` — the factory-health audit + this doc references +- `samples/FactoryDemo.Api.FSharp/` + + `samples/FactoryDemo.Api.CSharp/` — the concrete demo + (the `samples/FactoryDemo.Db/` companion is not yet + landed in main; it's tracked under the FactoryDemo + BACKLOG arc and will appear here once it lands)