-
Notifications
You must be signed in to change notification settings - Fork 1
docs: why-the-factory-is-different — audience-perspective explainer #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+225
−0
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,225 @@ | ||
| # Why this software factory is different | ||
|
|
||
| **Audience:** Anyone evaluating AI tooling for their engineering | ||
| work. Company engineering leadership. OSS project maintainers. | ||
| Individual contributors shipping on evenings. The same argument | ||
| scales across audience size. | ||
|
|
||
| **Short version:** Most AI coding tools are **assistants** that | ||
| help a human developer faster. This is a **factory** that owns | ||
| the whole coding + devops pipeline end-to-end with measurable | ||
| quality and DORA discipline. Those are different categories of | ||
| thing. | ||
|
|
||
| --- | ||
|
|
||
| ## What people typically know about AI in engineering (the common priors) | ||
|
|
||
| - **"AI helps developers write code faster."** True for IDE | ||
| assistants (Copilot, Cursor, Tabnine). They accelerate human | ||
| typing and completion. | ||
| - **"AI still needs human review."** True for most tooling — | ||
| the developer reads the suggestion, accepts / modifies / | ||
| rejects, owns the commit. | ||
| - **"AI can't safely own production changes."** Commonly held | ||
| belief. The reasoning is usually: *"deploying to a live | ||
| production system with zero downtime requires judgment, | ||
| context, and institutional memory humans have and AI does | ||
| not."* | ||
| - **"Full autonomy is for sandboxed toys."** The working | ||
| assumption that autonomous agents play in a safe playground | ||
| while real work stays human-gated. | ||
|
|
||
| Each of these is defensible when applied to typical AI tooling | ||
| in 2026. None of them is defensible when applied to this | ||
| factory. Here's why. | ||
|
|
||
| --- | ||
|
|
||
| ## What this factory actually does (refuting each prior) | ||
|
|
||
| ### "AI helps developers write code faster" → the factory IS the developer | ||
|
|
||
| - Ownership is not *"suggest a line, dev accepts"* — it is | ||
| *"the agent lands the commit, tests pass, reviewers sign off, | ||
| PR merges."* | ||
| - Specialist reviewers (harsh-critic, spec-zealot, perf | ||
| engineer, threat-model-critic, public-API designer, and | ||
| more) are composed into every change that touches their | ||
| domain. | ||
| - Formal verification (TLA+, Z3, FsCheck, Stryker, Lean) is | ||
| wired into the CI substrate. Claims the code makes about | ||
| its behaviour are checked against specs, not just unit | ||
| tests. | ||
|
|
||
| The human is not bypassed — humans are in the loop as | ||
| *maintainers*: scope, priority, strategic direction, | ||
| ratification of structural changes. They are not in the loop | ||
| as *bottleneck reviewers*. The factory removes the "needs | ||
| Aaron's eyes on every PR" failure mode without removing | ||
| Aaron's agency. | ||
|
AceHack marked this conversation as resolved.
|
||
|
|
||
| ### "AI still needs human review" → the factory IS the review | ||
|
|
||
| Review is not an event the factory asks for. Review is a | ||
| property of every commit. The reviewers are named, their | ||
| scopes are scoped, their rules are cited with stable rule-IDs | ||
| (BP-01..BP-NN). When a reviewer flags an issue, it produces a | ||
| rule-ID-citation and the fix path. | ||
|
|
||
| The human opens a PR description, clicks Merge, and the | ||
| quality floor is already held. | ||
|
|
||
| ### "AI can't safely own production changes" → it's the opposite | ||
|
|
||
| Humans are *not actually great* at zero-downtime production | ||
| changes. What makes humans safe on production is **process | ||
| discipline**: | ||
|
|
||
| - Peer review. | ||
| - Staged rollouts / canaries. | ||
| - Runbooks for known failure modes. | ||
| - Post-mortems that feed back into future work. | ||
| - Change windows and deployment gates. | ||
|
|
||
| These are process, not human insight. The factory follows | ||
| (and *enforces*) the same process, but without the human | ||
| failure modes: | ||
|
|
||
| - Review never gets skipped because the reviewer was on | ||
| vacation. | ||
| - Canaries are always evaluated against explicit rule-IDs, not | ||
| "it looked fine for a few minutes." | ||
| - Post-mortems file lessons into durable memory that future | ||
| work *actually consults* — not a document everyone read | ||
| once and forgot. | ||
| - Change windows and gates are configuration, not norms | ||
| hoping to hold. | ||
|
|
||
| Net effect: the factory's DORA metrics (deployment frequency, | ||
| lead time for changes, change failure rate, MTTR) can be held | ||
| at or better than human-only teams. Not because the factory is | ||
| smarter than the humans — because it's more disciplined about | ||
| the parts humans struggle to sustain: continuous rigor, | ||
| memory permanence, and lesson integration. | ||
|
|
||
| ### "Full autonomy is for sandboxed toys" → the factory is production-posture by default | ||
|
|
||
| - Every commit is measured against the live-lock smell audit | ||
| (ratio of product motion vs process motion; see | ||
| `tools/audit/live-lock-audit.sh`). | ||
|
AceHack marked this conversation as resolved.
AceHack marked this conversation as resolved.
|
||
| - Every lesson learned from a failure mode is filed into | ||
| `docs/hygiene-history/*.md` for future consultation. | ||
| - Alignment is an observable — Zeta's primary research | ||
| contribution is **measurable AI alignment** (see | ||
| `docs/ALIGNMENT.md`). The factory builds its own work on | ||
| the discipline it's researching. | ||
| - Retraction-native change substrate: rollback is first-class | ||
| algebra, not a crisis response. Any delta has a clean inverse. | ||
|
|
||
| --- | ||
|
|
||
| ## Why this helps adopting teams forward their objectives | ||
|
AceHack marked this conversation as resolved.
|
||
|
|
||
| ### For a company | ||
|
|
||
| - **Engineering velocity unbounded by senior-reviewer | ||
| capacity.** Juniors ship to the factory's quality floor | ||
| without waiting for a senior's attention. | ||
| - **Deployment frequency up** — the factory does not sleep, | ||
| vacation, or shuffle priorities. | ||
| - **Change failure rate down** — the reviewer panel and | ||
| formal-verification gate catch what humans often miss under | ||
| schedule pressure. | ||
| - **MTTR bounded** — retraction-native algebra means | ||
| rollbacks are surgical, not re-deploys. | ||
| - **Incident lessons persist** — the factory remembers what | ||
| the team forgot. | ||
|
|
||
| ### For an OSS project | ||
|
|
||
| - **Maintainer burden drops** — the factory does the rote | ||
| review and discipline work maintainers typically absorb | ||
| unpaid. | ||
| - **Contributor experience improves** — PRs get quality | ||
| feedback quickly with rule-IDs, not "looks fine, merging | ||
| when I get around to it." | ||
| - **Project survives maintainer turnover** — durable memory | ||
| + governance substrate means the project's institutional | ||
| knowledge doesn't live in one person's head. | ||
|
|
||
| ### For an individual contributor | ||
|
|
||
| - **Shipping on evenings becomes reliable** — the factory's | ||
| review + verification gate catches the kinds of bugs you'd | ||
| otherwise find in production Monday morning. | ||
| - **Generalist becomes specialist-aware** — each agent is a | ||
| specialist in its scope; you inherit that specialist | ||
| knowledge without hiring it. | ||
| - **You keep moving when you're tired** — the factory's | ||
| discipline is deterministic; yours is not after hour 3. | ||
|
|
||
| --- | ||
|
|
||
| ## What this factory is NOT | ||
|
|
||
| - **Not a product.** This repo is open-source and research- | ||
| driven. The factory is a methodology + substrate. Adopting | ||
| projects take the substrate and run it on their own work. | ||
| - **Not a replacement for human judgment on what to build.** | ||
| Scope, priority, strategic direction, and ratification of | ||
| structural changes stay human. The factory ships what the | ||
| human directs. | ||
| - **Not a claim that AI is strictly better than humans.** The | ||
| factory is better at *sustained rigor* and *memory | ||
| permanence*. Humans are still better at novel-problem | ||
| synthesis, stakeholder relationships, and strategic | ||
| vision. The factory augments, not replaces. | ||
| - **Not a promise that adoption is zero-friction.** Adopting | ||
| a software factory is a change for any team. The factory | ||
| earns its keep over weeks to months, not hours. | ||
|
|
||
| --- | ||
|
|
||
| ## How to evaluate adoption | ||
|
|
||
| Three concrete signals to watch in the first weeks: | ||
|
|
||
| 1. **DORA four-key trend.** Deployment frequency, lead time, | ||
| change failure rate, MTTR — compare a 4-week window before | ||
| adoption to a 4-week window after. The factory should | ||
| improve at least three of the four. | ||
| 2. **Live-lock smell ratio.** Is the factory's output skewed | ||
| toward process-churn without product motion? Run | ||
| `tools/audit/live-lock-audit.sh 25` on your `main`. EXT | ||
|
AceHack marked this conversation as resolved.
|
||
| ratio < 20% is a smell firing. | ||
|
AceHack marked this conversation as resolved.
|
||
| 3. **Lesson-integration cadence.** Are lessons from failures | ||
| landing in durable memory and actually getting consulted? | ||
| Grep the hygiene-history files for "Lessons integrated" | ||
| sections after each incident. | ||
|
AceHack marked this conversation as resolved.
|
||
|
|
||
| If all three are healthy, adoption is paying off. If any is | ||
| unhealthy, the factory has a bug in your configuration — file | ||
| it as a BACKLOG item, consult the memory substrate, fix it. | ||
|
|
||
| --- | ||
|
|
||
| ## Further reading | ||
|
|
||
| - `README.md` — what Zeta the library is | ||
| - `AGENTS.md` — the universal onboarding handbook for agents | ||
| working in this factory | ||
| - `CLAUDE.md` — Claude Code-specific bootstrap | ||
| - `GOVERNANCE.md` — numbered repo-wide rules | ||
| - `docs/ALIGNMENT.md` — the alignment contract | ||
| - `docs/ARCHITECTURE.md` — how the pieces fit | ||
| - `docs/plans/factory-demo-scope.md` — the concrete factory- | ||
| demo scope + build sequence (if present — currently on a | ||
| feature branch) | ||
|
AceHack marked this conversation as resolved.
|
||
| - `tools/audit/live-lock-audit.sh` — the factory-health audit | ||
| this doc references | ||
| - `samples/FactoryDemo.Api.FSharp/` + | ||
| `samples/FactoryDemo.Api.CSharp/` — the concrete demo | ||
| (the `samples/FactoryDemo.Db/` companion is not yet | ||
| landed in main; it's tracked under the FactoryDemo | ||
| BACKLOG arc and will appear here once it lands) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.