Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .claude/rules/backlog-item-start-gate.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,19 @@ complete a checklist directly on the row body:
1. **Prior-art-search** across the existing axes:
wake-time-substrate + skill-router + orthogonal-axes +
Otto-364 + PR #1701 + decision-archaeology + lost-files
canonical at `tools/hygiene/LOST-FILES-LOCATIONS.md`.
canonical at `tools/hygiene/LOST-FILES-LOCATIONS.md` + **the
curated external prior-art surface at `references/upstreams/`**
(humans who've solved similar problems; mix of cutting-edge
and tried-and-true; prefer explicit-target subtree searches
like `rg "pattern" references/upstreams/postgres/` for focused
prior-art research; `rg` from repo root is also safe — it
respects `.gitignore` and `references/upstreams/*` is gitignored
— but explicit-target is recommended because it documents
which upstream(s) you actually consulted on the backlog row;
see `.claude/rules/references-upstreams-not-our-code-search-excludes.md`
for the full two-modes table and the plain-grep caveats).
The watchlist + category index lives at `docs/UPSTREAM-LIST.md`;
the synthesis notes at `references/notes/`.
Log surfaces searched, queries used, results found on the row.

2. **Dependency-restructure**:
Expand Down
284 changes: 284 additions & 0 deletions .claude/rules/references-upstreams-not-our-code-search-excludes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
# `references/upstreams/` is NOT our code — search/scan operations must exclude it

Carved sentence:

> `references/upstreams/` is OTHER PEOPLE'S CODE that we mirror for
> study (regeneratable mirror state, gitignored, never hand-edited).
> Any search operation that walks the file tree (`find`, `grep -r`,
> `xargs grep`, recursive-file-walk scripts) MUST exclude
> `references/upstreams/` — otherwise scans run for hours, surface
> false-positives from unrelated upstream code, and pollute results
> with patterns that match in protobuf docs, gRPC tests, Redis
> manifests, etc. **Default to ripgrep** (`rg`) which respects
> `.gitignore` automatically. For plain `grep -r`, use
> `--exclude-dir=upstreams` (basename glob, NOT a path) or an
> explicit allowlist (`memory/ docs/ .claude/ tools/`). For
> `find`, use `-not -path './references/upstreams/*'` (the `find`
> command does NOT support `--exclude-dir`).

## Operational content

Per `references/README.md`:

- `references/upstreams/` is "Disposable mirror state — cloned
upstream repositories used as read-only references. **Gitignored;
regeneratable via script; never hand-edited.**"
- `.gitignore` line: `references/upstreams/*` (with carve-outs for
the directory's own `.gitignore` + `README.md`)
- The mirror state regenerates from `references/reference-sources.json`
via the sync script

Git operations naturally skip the tree (gitignored). But plain
`find` / `grep -r` / `xargs grep` / custom file-walk scripts do
NOT respect gitignore — they walk the entire filesystem.

## The failure mode this rule prevents

The authoring agent 2026-05-15T~13:00Z spawned this search:

```bash
find . -type f -name "*.md" 2>/dev/null | xargs grep -l \
"lock.free.*weight.free\|weight.free.*lock.free\|scale.free.*lock.free\|DV2.0\|5 always.active\|root discipline" 2>/dev/null
```

`find` quickly listed all `.md` files including the upstream tree.
`xargs grep` then recursed through gigabytes of upstream content
(protobuf docs, gRPC tests, Redis manifests, etc.). The process was
still running 2+ hours later when the human maintainer noticed "11
shells running is impressive otto" — the shell count surfaced the
runaway grep.

Result: no useful output (the patterns don't appear in upstream code),
hours of CPU and IO wasted, multiple monitor processes lingering,
genuine pollution of the agent's working state.

The substrate-honest fix is to encode the discipline so future agents
(and any other Zeta AI) doesn't hit the same trap.

## Operational discipline

When searching the repo for content:

### Prefer ripgrep (respects gitignore by default)

```bash
rg "pattern" --type md # markdown only
rg "pattern" docs/ memory/ # explicit allowlist
```

### Plain `grep -r` needs explicit excludes (with caveat)

```bash
# GNU grep --exclude-dir takes a BASENAME glob, not a path —
# so --exclude-dir=upstreams excludes any directory named
# 'upstreams' anywhere in the tree (currently only references/upstreams/).
# If a second 'upstreams/' ever appears that we DO want to search,
# this approach overreaches and we need the explicit-allowlist
# approach below instead.
grep -rn "pattern" \
--exclude-dir=upstreams \
--exclude-dir=node_modules \
--exclude-dir=.git \
--exclude-dir=bin --exclude-dir=obj \
memory/ docs/ .claude/ tools/
```

**Caveat**: GNU `grep`'s `--exclude-dir=GLOB` matches directory
*names* (basename), NOT slash-delimited paths. So
`--exclude-dir=references/upstreams` does NOT work (silently
matches nothing). Use the basename `upstreams` instead, OR use
explicit-allowlist sub-paths (`memory/ docs/ .claude/ tools/`)
which sidestep the issue entirely.

**Better**: just use `rg` — it respects `.gitignore` by default
and `references/upstreams/*` is already gitignored.

### `find | xargs grep` is the worst trap

The `find` step is fast and listful; the `xargs grep` step inherits
the listful tree and recurses. Even with `-l` (list-only), grep
opens each file and scans until first match — and on a tree of
gigabytes of upstream content, that's hours.

Fix: filter the `find` output BEFORE passing to xargs:

```bash
find . -type f -name "*.md" \
-not -path "./references/upstreams/*" \
-not -path "./node_modules/*" \
Comment on lines +106 to +108
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prune excluded tree before find recursion

This find recipe filters paths after traversal instead of preventing descent, so find still walks references/upstreams/** and stats every file before xargs runs. On large mirrored upstream trees, that preserves the expensive IO this rule is trying to avoid; find -D search shows excluded subtrees are still visited with -not -path. Use a -prune branch to stop recursion into references/upstreams entirely.

Useful? React with 👍 / 👎.

-not -path "./.git/*" \
2>/dev/null | xargs grep -l "pattern" 2>/dev/null
```

### Explicit allowlists beat exclude lists

When you know which directories to search, name them. Don't start
from `.` and try to exclude — you'll always miss something.

**Repo-native content lives in**:

- `memory/` — agent memory + persona folders + conversations
- `docs/` — governance, research, hygiene-history, history
- `.claude/` — rules, skills, agents, commands, hooks, settings
- `tools/` — TS scripts (per Rule 0; no `.sh` except install-graph)
- Top-level files (CLAUDE.md, README, GOVERNANCE.md, AGENTS.md,
ALIGNMENT.md, MANIFESTO.md target locations)

**Non-repo content** (skip in searches by default):

- `references/upstreams/` — gitignored mirror state of OTHER repos
- `references/<legacy>/` — legacy imports (e.g., `tla-book/`)
with their own file layouts; check `references/README.md` for
current state
- `node_modules/` — npm dependencies
- `bin/`, `obj/` — .NET build outputs
- `target/` — Rust build outputs
- `.git/` — git internals

## When `references/upstreams/` IS the right search target

Not rare — actually a **first-class workflow** during backlog
research. Per the human maintainer 2026-05-15: *"when doing
backlog items this is a good place to know about humans whoved
solved similar issues i've been gathering their githubs so we
can learn when doing our backlog itmes. some of these are very
cutting edge and some are tried and true been around for years."*

`references/upstreams/` is the curated **prior-art surface** —
humans who've solved similar problems, mirrored as read-only
references. When starting a backlog item, consulting the relevant
upstream(s) is encouraged and composes with
`.claude/rules/backlog-item-start-gate.md` (prior-art-search step).

**The two modes are not in tension:**

| Mode | Pattern | Treatment |
|---|---|---|
| **Backlog prior-art research** (explicit-target) | `rg "pattern" references/upstreams/postgres/` | Encouraged; one of the curated prior-art surfaces; log queries on the backlog row |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make upstream rg examples bypass ignore rules

The explicit-target guidance rg "pattern" references/upstreams/postgres/ will usually return false negatives because this repo ignores mirrored upstream content (.gitignore plus references/upstreams/.gitignore), and ripgrep applies ignore rules to directory searches by default. In a local repro with an ignored directory, rg needle ignored/ produced no matches until -uu/--no-ignore was added, so users following this rule can believe an upstream lacks a pattern when it is actually present.

Useful? React with 👍 / 👎.

| **Unconstrained repo scan with plain `grep -r`** or `find . \| xargs grep` | (`grep -rn "pattern" .`) | MUST exclude `--exclude-dir=upstreams`; otherwise runaway-scan failure mode |
| **Unconstrained repo scan with ripgrep** | `rg "pattern" .` | Safe-by-default — ripgrep respects `.gitignore`, and `references/upstreams/*` is already gitignored |

Other legitimate explicit-target reasons:

- Verifying that an upstream actually contains a feature we
attribute to it (e.g., "does Spanner actually do X?")
- Auditing for license-text or attribution requirements when
taking an upstream excerpt into `references/notes/`

**Discovery surfaces for upstream prior-art:**

- `docs/UPSTREAM-LIST.md` — curated watchlist + category index
- `references/notes/` — synthesis notes ("what matters from each
upstream"); start here before grepping the mirror
- `references/reference-sources.json` — full source list

**Refresh the mirror on demand:**

```bash
tools/setup/common/sync-upstreams.sh # refresh all
tools/setup/common/sync-upstreams.sh --name foo,bar # subset
tools/setup/common/sync-upstreams.sh --prune # drop stale
```

The script reads `references/reference-sources.json`, shallow-clones
or fast-fetches each upstream into `references/upstreams/<name>/`,
and resets-hard to match `origin/<branch>` byte-for-byte. Safe to
re-run; `ls-remote` short-circuits when local HEAD already matches.

In all cases, **EXPLICITLY target the subtree** when grepping
the mirror:

```bash
rg "pattern" references/upstreams/spanner/
```

— never start from `.` and let the recursion drift.

## Why this rule auto-loads

Per `.claude/rules/wake-time-substrate.md`: load-bearing search-
hygiene knowledge needs wake-time landing. Without this rule:

- A future agent cold-booting may spawn the same kind of runaway
search (the failure mode is operationally tempting because
`find | xargs grep` is the obvious one-liner for the search
problem)
- Other Zeta agents searching the repo for the first time inherit
the gap by default
- The 2-hour-grep evidence (this rule's authoring trigger) shows
the failure mode is real and recurring

## Composes with other rules

- `.claude/rules/rule-0-no-sh-files.md` — TS-first discipline;
search wrappers should be `.ts` not `.sh` (compose: when
authoring a search tool, use ripgrep via Bun child_process and
encode the exclusion list there)
- `.claude/rules/wake-time-substrate.md` — load-bearing search-
hygiene needs auto-loaded landing
- `.claude/rules/refresh-world-model-poll-pr-gate.md` — prefer
scripted commands (like ripgrep wrappers) over ad-hoc bash;
same shape as `gh|jq` chains
- `.claude/rules/holding-without-named-dependency-is-standing-by-failure.md`
— runaway searches LOOK like work-in-progress but produce no
output; same shape as Standing-by failure mode (operational
noise without operational signal)

## Composes with substrate

- `references/README.md` — canonical definition of what
`references/upstreams/` is
- `references/reference-sources.json` — the upstream watchlist
the mirror tree regenerates from
- `.gitignore` — line `references/upstreams/*` is the existing
git-level enforcement
- The substrate-honest failure-mode anchor: the 2-hour-grep
evidence on 2026-05-15 that authored this rule

## Substrate-honest framing

This rule does NOT prevent the failure mode at the tool level.
Plain `grep -r` and `find` will still walk `references/upstreams/`
if invoked without exclusions. The rule encodes the DISCIPLINE
that the agent applies; mechanizing it further would require:

- A shell function or alias that wraps grep with the standard
exclusions
- A `.ripgreprc` file at repo root (ripgrep already respects
gitignore so this is mostly redundant)
- A pre-Bash hook that catches the `find | xargs grep` pattern
and warns or rewrites it

Those are future-substrate options. This rule is the discipline-
level landing that catches the failure mode at cold-boot until
mechanization lands.

## Full reasoning

The human maintainer 2026-05-15T~15:25Z, after observing the
runaway grep process: *"references/upstreams/ in code we ignore
this folder everywhere casue its not our code but other githubs
we reference for ideas"*

The 2-hour-grep evidence trail:

1. An agent authored a manifesto-search bash one-liner using
`find | xargs grep -l` with no exclusion of `references/upstreams/`
2. find quickly returned all matching paths (including the upstream
tree); xargs grep recursed
3. The grep was still running 2 hours later when the human
maintainer's "11 shells running" observation surfaced it
4. The cause: plain `grep` doesn't respect gitignore; only ripgrep
does
5. Cleanup: explicit `pgrep | xargs kill` of the runaway grep
processes; 11 shells → 0
6. Substrate-landing: this rule encodes the discipline so the
failure mode doesn't recur

The substrate-honest meta-note: the failure mode was rooted in
agent-side ignorance of the `references/upstreams/` convention.
The maintainer caught it via shell-count observation, taught the
discipline, and the rule now lands so future agents don't repeat
the trap. That's the bandwidth-engineering pattern operating at
discipline-level: one piece of teaching → durable substrate that
serves future cold-boots without re-teaching.
11 changes: 11 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,5 +45,16 @@ See [`docs/CONFLICT-RESOLUTION.md`](docs/CONFLICT-RESOLUTION.md). On deadlock, t
- **Result-over-exception** — errors surface as `Result<_, DbspError>`; no exceptions on hot paths.
- **Memory fast-path** — read `~/.claude/projects/<slug>/memory/CURRENT-*.md` before raw
`feedback_*.md` logs; CURRENT files win on conflict with older raw memories.
- **`references/upstreams/` — explicit-target searches ONLY (curated prior-art surface, NOT our code).**
Mirror state of OTHER repos (protobuf, gRPC, Redis, etc.); gitignored; gigabytes; the only
folder where a naive plain `grep -r` or `find | xargs grep` from `.` becomes a 2-hour runaway.
BUT also the curated prior-art surface for backlog-item research — humans who've solved similar
problems, cutting-edge + tried-and-true. Two modes: **explicit-target encouraged**
(`rg "pattern" references/upstreams/postgres/` during backlog research; check
`docs/UPSTREAM-LIST.md` + `references/notes/` first); **unconstrained scan needs the right tool**
— `rg "pattern" .` is safe-by-default (ripgrep respects gitignore), but plain `grep -r` needs
`--exclude-dir=upstreams` (basename, NOT a path) or an explicit allowlist
(`memory/ docs/ .claude/ tools/`). Refresh the mirror on demand: `tools/setup/common/sync-upstreams.sh`.
Full: `.claude/rules/references-upstreams-not-our-code-search-excludes.md`.
- **Thoughts free, actions razored** — journal to `memory/` freely; CLAUDE.md additions
are razored (cooling-period required, disposition-shaping bar). Full: `memory/feedback_thoughts_free_actions_razored_*`.
Loading