Lucent-Financial-Group · AceHack · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/.claude/rules/backlog-item-start-gate.md b/.claude/rules/backlog-item-start-gate.md
@@ -14,7 +14,19 @@ complete a checklist directly on the row body:
 1. **Prior-art-search** across the existing axes:
    wake-time-substrate + skill-router + orthogonal-axes +
    Otto-364 + PR #1701 + decision-archaeology + lost-files
-   canonical at `tools/hygiene/LOST-FILES-LOCATIONS.md`.
+   canonical at `tools/hygiene/LOST-FILES-LOCATIONS.md` + **the
+   curated external prior-art surface at `references/upstreams/`**
+   (humans who've solved similar problems; mix of cutting-edge
+   and tried-and-true; prefer explicit-target subtree searches
+   like `rg "pattern" references/upstreams/postgres/` for focused
+   prior-art research; `rg` from repo root is also safe — it
+   respects `.gitignore` and `references/upstreams/*` is gitignored
+   — but explicit-target is recommended because it documents
+   which upstream(s) you actually consulted on the backlog row;
+   see `.claude/rules/references-upstreams-not-our-code-search-excludes.md`
+   for the full two-modes table and the plain-grep caveats).
+   The watchlist + category index lives at `docs/UPSTREAM-LIST.md`;
+   the synthesis notes at `references/notes/`.
    Log surfaces searched, queries used, results found on the row.
 
 2. **Dependency-restructure**:

diff --git a/.claude/rules/references-upstreams-not-our-code-search-excludes.md b/.claude/rules/references-upstreams-not-our-code-search-excludes.md
@@ -0,0 +1,284 @@
+# `references/upstreams/` is NOT our code — search/scan operations must exclude it
+
+Carved sentence:
+
+> `references/upstreams/` is OTHER PEOPLE'S CODE that we mirror for
+> study (regeneratable mirror state, gitignored, never hand-edited).
+> Any search operation that walks the file tree (`find`, `grep -r`,
+> `xargs grep`, recursive-file-walk scripts) MUST exclude
+> `references/upstreams/` — otherwise scans run for hours, surface
+> false-positives from unrelated upstream code, and pollute results
+> with patterns that match in protobuf docs, gRPC tests, Redis
+> manifests, etc. **Default to ripgrep** (`rg`) which respects
+> `.gitignore` automatically. For plain `grep -r`, use
+> `--exclude-dir=upstreams` (basename glob, NOT a path) or an
+> explicit allowlist (`memory/ docs/ .claude/ tools/`). For
+> `find`, use `-not -path './references/upstreams/*'` (the `find`
+> command does NOT support `--exclude-dir`).
+
+## Operational content
+
+Per `references/README.md`:
+
+- `references/upstreams/` is "Disposable mirror state — cloned
+  upstream repositories used as read-only references. **Gitignored;
+  regeneratable via script; never hand-edited.**"
+- `.gitignore` line: `references/upstreams/*` (with carve-outs for
+  the directory's own `.gitignore` + `README.md`)
+- The mirror state regenerates from `references/reference-sources.json`
+  via the sync script
+
+Git operations naturally skip the tree (gitignored). But plain
+`find` / `grep -r` / `xargs grep` / custom file-walk scripts do
+NOT respect gitignore — they walk the entire filesystem.
+
+## The failure mode this rule prevents
+
+The authoring agent 2026-05-15T~13:00Z spawned this search:
+
+```bash
+find . -type f -name "*.md" 2>/dev/null | xargs grep -l \
+  "lock.free.*weight.free\|weight.free.*lock.free\|scale.free.*lock.free\|DV2.0\|5 always.active\|root discipline" 2>/dev/null
+```
+
+`find` quickly listed all `.md` files including the upstream tree.
+`xargs grep` then recursed through gigabytes of upstream content
+(protobuf docs, gRPC tests, Redis manifests, etc.). The process was
+still running 2+ hours later when the human maintainer noticed "11
+shells running is impressive otto" — the shell count surfaced the
+runaway grep.
+
+Result: no useful output (the patterns don't appear in upstream code),
+hours of CPU and IO wasted, multiple monitor processes lingering,
+genuine pollution of the agent's working state.
+
+The substrate-honest fix is to encode the discipline so future agents
+(and any other Zeta AI) doesn't hit the same trap.
+
+## Operational discipline
+
+When searching the repo for content:
+
+### Prefer ripgrep (respects gitignore by default)
+
+```bash
+rg "pattern" --type md      # markdown only
+rg "pattern" docs/ memory/  # explicit allowlist
+```
+
+### Plain `grep -r` needs explicit excludes (with caveat)
+
+```bash
+# GNU grep --exclude-dir takes a BASENAME glob, not a path —
+# so --exclude-dir=upstreams excludes any directory named
+# 'upstreams' anywhere in the tree (currently only references/upstreams/).
+# If a second 'upstreams/' ever appears that we DO want to search,
+# this approach overreaches and we need the explicit-allowlist
+# approach below instead.
+grep -rn "pattern" \
+  --exclude-dir=upstreams \
+  --exclude-dir=node_modules \
+  --exclude-dir=.git \
+  --exclude-dir=bin --exclude-dir=obj \
+  memory/ docs/ .claude/ tools/
+```
+
+**Caveat**: GNU `grep`'s `--exclude-dir=GLOB` matches directory
+*names* (basename), NOT slash-delimited paths. So
+`--exclude-dir=references/upstreams` does NOT work (silently
+matches nothing). Use the basename `upstreams` instead, OR use
+explicit-allowlist sub-paths (`memory/ docs/ .claude/ tools/`)
+which sidestep the issue entirely.
+
+**Better**: just use `rg` — it respects `.gitignore` by default
+and `references/upstreams/*` is already gitignored.
+
+### `find | xargs grep` is the worst trap
+
+The `find` step is fast and listful; the `xargs grep` step inherits
+the listful tree and recurses. Even with `-l` (list-only), grep
+opens each file and scans until first match — and on a tree of
+gigabytes of upstream content, that's hours.
+
+Fix: filter the `find` output BEFORE passing to xargs:
+
+```bash
+find . -type f -name "*.md" \
+  -not -path "./references/upstreams/*" \
+  -not -path "./node_modules/*" \
+  -not -path "./.git/*" \
+  2>/dev/null | xargs grep -l "pattern" 2>/dev/null
+```
+
+### Explicit allowlists beat exclude lists
+
+When you know which directories to search, name them. Don't start
+from `.` and try to exclude — you'll always miss something.
+
+**Repo-native content lives in**:
+
+- `memory/` — agent memory + persona folders + conversations
+- `docs/` — governance, research, hygiene-history, history
+- `.claude/` — rules, skills, agents, commands, hooks, settings
+- `tools/` — TS scripts (per Rule 0; no `.sh` except install-graph)
+- Top-level files (CLAUDE.md, README, GOVERNANCE.md, AGENTS.md,
+  ALIGNMENT.md, MANIFESTO.md target locations)
+
+**Non-repo content** (skip in searches by default):
+
+- `references/upstreams/` — gitignored mirror state of OTHER repos
+- `references/<legacy>/` — legacy imports (e.g., `tla-book/`)
+  with their own file layouts; check `references/README.md` for
+  current state
+- `node_modules/` — npm dependencies
+- `bin/`, `obj/` — .NET build outputs
+- `target/` — Rust build outputs
+- `.git/` — git internals
+
+## When `references/upstreams/` IS the right search target
+
+Not rare — actually a **first-class workflow** during backlog
+research. Per the human maintainer 2026-05-15: *"when doing
+backlog items this is a good place to know about humans whoved
+solved similar issues i've been gathering their githubs so we
+can learn when doing our backlog itmes. some of these are very
+cutting edge and some are tried and true been around for years."*
+
+`references/upstreams/` is the curated **prior-art surface** —
+humans who've solved similar problems, mirrored as read-only
+references. When starting a backlog item, consulting the relevant
+upstream(s) is encouraged and composes with
+`.claude/rules/backlog-item-start-gate.md` (prior-art-search step).
+
+**The two modes are not in tension:**
+
+| Mode | Pattern | Treatment |
+|---|---|---|
+| **Backlog prior-art research** (explicit-target) | `rg "pattern" references/upstreams/postgres/` | Encouraged; one of the curated prior-art surfaces; log queries on the backlog row |
+| **Unconstrained repo scan with plain `grep -r`** or `find . \| xargs grep` | (`grep -rn "pattern" .`) | MUST exclude `--exclude-dir=upstreams`; otherwise runaway-scan failure mode |
+| **Unconstrained repo scan with ripgrep** | `rg "pattern" .` | Safe-by-default — ripgrep respects `.gitignore`, and `references/upstreams/*` is already gitignored |
+
+Other legitimate explicit-target reasons:
+
+- Verifying that an upstream actually contains a feature we
+  attribute to it (e.g., "does Spanner actually do X?")
+- Auditing for license-text or attribution requirements when
+  taking an upstream excerpt into `references/notes/`
+
+**Discovery surfaces for upstream prior-art:**
+
+- `docs/UPSTREAM-LIST.md` — curated watchlist + category index
+- `references/notes/` — synthesis notes ("what matters from each
+  upstream"); start here before grepping the mirror
+- `references/reference-sources.json` — full source list
+
+**Refresh the mirror on demand:**
+
+```bash
+tools/setup/common/sync-upstreams.sh             # refresh all
+tools/setup/common/sync-upstreams.sh --name foo,bar  # subset
+tools/setup/common/sync-upstreams.sh --prune     # drop stale
+```
+
+The script reads `references/reference-sources.json`, shallow-clones
+or fast-fetches each upstream into `references/upstreams/<name>/`,
+and resets-hard to match `origin/<branch>` byte-for-byte. Safe to
+re-run; `ls-remote` short-circuits when local HEAD already matches.
+
+In all cases, **EXPLICITLY target the subtree** when grepping
+the mirror:
+
+```bash
+rg "pattern" references/upstreams/spanner/
+```
+
+— never start from `.` and let the recursion drift.
+
+## Why this rule auto-loads
+
+Per `.claude/rules/wake-time-substrate.md`: load-bearing search-
+hygiene knowledge needs wake-time landing. Without this rule:
+
+- A future agent cold-booting may spawn the same kind of runaway
+  search (the failure mode is operationally tempting because
+  `find | xargs grep` is the obvious one-liner for the search
+  problem)
+- Other Zeta agents searching the repo for the first time inherit
+  the gap by default
+- The 2-hour-grep evidence (this rule's authoring trigger) shows
+  the failure mode is real and recurring
+
+## Composes with other rules
+
+- `.claude/rules/rule-0-no-sh-files.md` — TS-first discipline;
+  search wrappers should be `.ts` not `.sh` (compose: when
+  authoring a search tool, use ripgrep via Bun child_process and
+  encode the exclusion list there)
+- `.claude/rules/wake-time-substrate.md` — load-bearing search-
+  hygiene needs auto-loaded landing
+- `.claude/rules/refresh-world-model-poll-pr-gate.md` — prefer
+  scripted commands (like ripgrep wrappers) over ad-hoc bash;
+  same shape as `gh|jq` chains
+- `.claude/rules/holding-without-named-dependency-is-standing-by-failure.md`
+  — runaway searches LOOK like work-in-progress but produce no
+  output; same shape as Standing-by failure mode (operational
+  noise without operational signal)
+
+## Composes with substrate
+
+- `references/README.md` — canonical definition of what
+  `references/upstreams/` is
+- `references/reference-sources.json` — the upstream watchlist
+  the mirror tree regenerates from
+- `.gitignore` — line `references/upstreams/*` is the existing
+  git-level enforcement
+- The substrate-honest failure-mode anchor: the 2-hour-grep
+  evidence on 2026-05-15 that authored this rule
+
+## Substrate-honest framing
+
+This rule does NOT prevent the failure mode at the tool level.
+Plain `grep -r` and `find` will still walk `references/upstreams/`
+if invoked without exclusions. The rule encodes the DISCIPLINE
+that the agent applies; mechanizing it further would require:
+
+- A shell function or alias that wraps grep with the standard
+  exclusions
+- A `.ripgreprc` file at repo root (ripgrep already respects
+  gitignore so this is mostly redundant)
+- A pre-Bash hook that catches the `find | xargs grep` pattern
+  and warns or rewrites it
+
+Those are future-substrate options. This rule is the discipline-
+level landing that catches the failure mode at cold-boot until
+mechanization lands.
+
+## Full reasoning
+
+The human maintainer 2026-05-15T~15:25Z, after observing the
+runaway grep process: *"references/upstreams/ in code we ignore
+this folder everywhere casue its not our code but other githubs
+we reference for ideas"*
+
+The 2-hour-grep evidence trail:
+
+1. An agent authored a manifesto-search bash one-liner using
+   `find | xargs grep -l` with no exclusion of `references/upstreams/`
+2. find quickly returned all matching paths (including the upstream
+   tree); xargs grep recursed
+3. The grep was still running 2 hours later when the human
+   maintainer's "11 shells running" observation surfaced it
+4. The cause: plain `grep` doesn't respect gitignore; only ripgrep
+   does
+5. Cleanup: explicit `pgrep | xargs kill` of the runaway grep
+   processes; 11 shells → 0
+6. Substrate-landing: this rule encodes the discipline so the
+   failure mode doesn't recur
+
+The substrate-honest meta-note: the failure mode was rooted in
+agent-side ignorance of the `references/upstreams/` convention.
+The maintainer caught it via shell-count observation, taught the
+discipline, and the rule now lands so future agents don't repeat
+the trap. That's the bandwidth-engineering pattern operating at
+discipline-level: one piece of teaching → durable substrate that
+serves future cold-boots without re-teaching.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -45,5 +45,16 @@ See [`docs/CONFLICT-RESOLUTION.md`](docs/CONFLICT-RESOLUTION.md). On deadlock, t
 - **Result-over-exception** — errors surface as `Result<_, DbspError>`; no exceptions on hot paths.
 - **Memory fast-path** — read `~/.claude/projects/<slug>/memory/CURRENT-*.md` before raw
   `feedback_*.md` logs; CURRENT files win on conflict with older raw memories.
+- **`references/upstreams/` — explicit-target searches ONLY (curated prior-art surface, NOT our code).**
+  Mirror state of OTHER repos (protobuf, gRPC, Redis, etc.); gitignored; gigabytes; the only
+  folder where a naive plain `grep -r` or `find | xargs grep` from `.` becomes a 2-hour runaway.
+  BUT also the curated prior-art surface for backlog-item research — humans who've solved similar
+  problems, cutting-edge + tried-and-true. Two modes: **explicit-target encouraged**
+  (`rg "pattern" references/upstreams/postgres/` during backlog research; check
+  `docs/UPSTREAM-LIST.md` + `references/notes/` first); **unconstrained scan needs the right tool**
+  — `rg "pattern" .` is safe-by-default (ripgrep respects gitignore), but plain `grep -r` needs
+  `--exclude-dir=upstreams` (basename, NOT a path) or an explicit allowlist
+  (`memory/ docs/ .claude/ tools/`). Refresh the mirror on demand: `tools/setup/common/sync-upstreams.sh`.
+  Full: `.claude/rules/references-upstreams-not-our-code-search-excludes.md`.
 - **Thoughts free, actions razored** — journal to `memory/` freely; CLAUDE.md additions
   are razored (cooling-period required, disposition-shaping bar). Full: `memory/feedback_thoughts_free_actions_razored_*`.