Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
225 changes: 225 additions & 0 deletions docs/plans/why-the-factory-is-different.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Why this software factory is different

**Audience:** Anyone evaluating AI tooling for their engineering
work. Company engineering leadership. OSS project maintainers.
Individual contributors shipping on evenings. The same argument
scales across audience size.

**Short version:** Most AI coding tools are **assistants** that
help a human developer faster. This is a **factory** that owns
the whole coding + devops pipeline end-to-end with measurable
quality and DORA discipline. Those are different categories of
thing.

---

## What people typically know about AI in engineering (the common priors)

- **"AI helps developers write code faster."** True for IDE
assistants (Copilot, Cursor, Tabnine). They accelerate human
typing and completion.
- **"AI still needs human review."** True for most tooling —
the developer reads the suggestion, accepts / modifies /
rejects, owns the commit.
- **"AI can't safely own production changes."** Commonly held
belief. The reasoning is usually: *"deploying to a live
production system with zero downtime requires judgment,
context, and institutional memory humans have and AI does
not."*
- **"Full autonomy is for sandboxed toys."** The working
assumption that autonomous agents play in a safe playground
while real work stays human-gated.

Each of these is defensible when applied to typical AI tooling
in 2026. None of them is defensible when applied to this
factory. Here's why.

---

## What this factory actually does (refuting each prior)

### "AI helps developers write code faster" → the factory IS the developer

- Ownership is not *"suggest a line, dev accepts"* — it is
*"the agent lands the commit, tests pass, reviewers sign off,
PR merges."*
- Specialist reviewers (harsh-critic, spec-zealot, perf
engineer, threat-model-critic, public-API designer, and
more) are composed into every change that touches their
domain.
- Formal verification (TLA+, Z3, FsCheck, Stryker, Lean) is
wired into the CI substrate. Claims the code makes about
its behaviour are checked against specs, not just unit
tests.
Comment thread
AceHack marked this conversation as resolved.

The human is not bypassed — humans are in the loop as
*maintainers*: scope, priority, strategic direction,
ratification of structural changes. They are not in the loop
as *bottleneck reviewers*. The factory removes the "needs
Aaron's eyes on every PR" failure mode without removing
Aaron's agency.
Comment thread
AceHack marked this conversation as resolved.

### "AI still needs human review" → the factory IS the review

Review is not an event the factory asks for. Review is a
property of every commit. The reviewers are named, their
scopes are scoped, their rules are cited with stable rule-IDs
(BP-01..BP-NN). When a reviewer flags an issue, it produces a
rule-ID-citation and the fix path.

The human opens a PR description, clicks Merge, and the
quality floor is already held.

### "AI can't safely own production changes" → it's the opposite

Humans are *not actually great* at zero-downtime production
changes. What makes humans safe on production is **process
discipline**:

- Peer review.
- Staged rollouts / canaries.
- Runbooks for known failure modes.
- Post-mortems that feed back into future work.
- Change windows and deployment gates.

These are process, not human insight. The factory follows
(and *enforces*) the same process, but without the human
failure modes:

- Review never gets skipped because the reviewer was on
vacation.
- Canaries are always evaluated against explicit rule-IDs, not
"it looked fine for a few minutes."
- Post-mortems file lessons into durable memory that future
work *actually consults* — not a document everyone read
once and forgot.
- Change windows and gates are configuration, not norms
hoping to hold.

Net effect: the factory's DORA metrics (deployment frequency,
lead time for changes, change failure rate, MTTR) can be held
at or better than human-only teams. Not because the factory is
smarter than the humans — because it's more disciplined about
the parts humans struggle to sustain: continuous rigor,
memory permanence, and lesson integration.

### "Full autonomy is for sandboxed toys" → the factory is production-posture by default

- Every commit is measured against the live-lock smell audit
(ratio of product motion vs process motion; see
`tools/audit/live-lock-audit.sh`).
Comment thread
AceHack marked this conversation as resolved.
Comment thread
AceHack marked this conversation as resolved.
- Every lesson learned from a failure mode is filed into
`docs/hygiene-history/*.md` for future consultation.
- Alignment is an observable — Zeta's primary research
contribution is **measurable AI alignment** (see
`docs/ALIGNMENT.md`). The factory builds its own work on
the discipline it's researching.
- Retraction-native change substrate: rollback is first-class
algebra, not a crisis response. Any delta has a clean inverse.

---

## Why this helps adopting teams forward their objectives
Comment thread
AceHack marked this conversation as resolved.

### For a company

- **Engineering velocity unbounded by senior-reviewer
capacity.** Juniors ship to the factory's quality floor
without waiting for a senior's attention.
- **Deployment frequency up** — the factory does not sleep,
vacation, or shuffle priorities.
- **Change failure rate down** — the reviewer panel and
formal-verification gate catch what humans often miss under
schedule pressure.
- **MTTR bounded** — retraction-native algebra means
rollbacks are surgical, not re-deploys.
- **Incident lessons persist** — the factory remembers what
the team forgot.

### For an OSS project

- **Maintainer burden drops** — the factory does the rote
review and discipline work maintainers typically absorb
unpaid.
- **Contributor experience improves** — PRs get quality
feedback quickly with rule-IDs, not "looks fine, merging
when I get around to it."
- **Project survives maintainer turnover** — durable memory
+ governance substrate means the project's institutional
knowledge doesn't live in one person's head.

### For an individual contributor

- **Shipping on evenings becomes reliable** — the factory's
review + verification gate catches the kinds of bugs you'd
otherwise find in production Monday morning.
- **Generalist becomes specialist-aware** — each agent is a
specialist in its scope; you inherit that specialist
knowledge without hiring it.
- **You keep moving when you're tired** — the factory's
discipline is deterministic; yours is not after hour 3.

---

## What this factory is NOT

- **Not a product.** This repo is open-source and research-
driven. The factory is a methodology + substrate. Adopting
projects take the substrate and run it on their own work.
- **Not a replacement for human judgment on what to build.**
Scope, priority, strategic direction, and ratification of
structural changes stay human. The factory ships what the
human directs.
- **Not a claim that AI is strictly better than humans.** The
factory is better at *sustained rigor* and *memory
permanence*. Humans are still better at novel-problem
synthesis, stakeholder relationships, and strategic
vision. The factory augments, not replaces.
- **Not a promise that adoption is zero-friction.** Adopting
a software factory is a change for any team. The factory
earns its keep over weeks to months, not hours.

---

## How to evaluate adoption

Three concrete signals to watch in the first weeks:

1. **DORA four-key trend.** Deployment frequency, lead time,
change failure rate, MTTR — compare a 4-week window before
adoption to a 4-week window after. The factory should
improve at least three of the four.
2. **Live-lock smell ratio.** Is the factory's output skewed
toward process-churn without product motion? Run
`tools/audit/live-lock-audit.sh 25` on your `main`. EXT
Comment thread
AceHack marked this conversation as resolved.
ratio < 20% is a smell firing.
Comment thread
AceHack marked this conversation as resolved.
3. **Lesson-integration cadence.** Are lessons from failures
landing in durable memory and actually getting consulted?
Grep the hygiene-history files for "Lessons integrated"
sections after each incident.
Comment thread
AceHack marked this conversation as resolved.

If all three are healthy, adoption is paying off. If any is
unhealthy, the factory has a bug in your configuration — file
it as a BACKLOG item, consult the memory substrate, fix it.

---

## Further reading

- `README.md` — what Zeta the library is
- `AGENTS.md` — the universal onboarding handbook for agents
working in this factory
- `CLAUDE.md` — Claude Code-specific bootstrap
- `GOVERNANCE.md` — numbered repo-wide rules
- `docs/ALIGNMENT.md` — the alignment contract
- `docs/ARCHITECTURE.md` — how the pieces fit
- `docs/plans/factory-demo-scope.md` — the concrete factory-
demo scope + build sequence (if present — currently on a
feature branch)
Comment thread
AceHack marked this conversation as resolved.
- `tools/audit/live-lock-audit.sh` — the factory-health audit
this doc references
- `samples/FactoryDemo.Api.FSharp/` +
`samples/FactoryDemo.Api.CSharp/` — the concrete demo
(the `samples/FactoryDemo.Db/` companion is not yet
landed in main; it's tracked under the FactoryDemo
BACKLOG arc and will appear here once it lands)
Loading