feat(rules): references/upstreams/ is NOT our code — search/scan operations must exclude it#3510
Conversation
…ations must exclude it Per the human maintainer 2026-05-15T~15:25Z, after observing the runaway 2-hour grep process I spawned during the manifesto search: 'references/upstreams/ in code we ignore this folder everywhere casue its not our code but other githubs we reference for ideas' This rule encodes the discipline so future-Otto (and other Zeta AIs) don't hit the same trap. The failure mode: - Plain 'grep -r' / 'find | xargs grep' / 'find -name' don't respect gitignore - references/upstreams/* IS gitignored (regeneratable mirror state of OTHER repos — protobuf, gRPC, Redis, etc.) - A 'find . -type f -name *.md | xargs grep -l pattern' one-liner walked the upstream tree (gigabytes) and ran for 2+ hours - Surfaced via shell-count observation: 'Aaron: 11 shells running is impressive otto' The discipline: 1. Prefer ripgrep (respects gitignore by default) 2. Plain grep needs explicit --exclude-dir=references/upstreams 3. find | xargs grep needs -not -path './references/upstreams/*' filter BEFORE xargs 4. Explicit allowlists beat exclude lists — name memory/, docs/, .claude/, tools/ instead of starting from . 5. When references/upstreams/ IS the right target, EXPLICITLY target the subtree; never let recursion drift there The substrate-honest meta-note: failure was agent-side ignorance of the references/upstreams convention. Maintainer caught it via shell-count, taught the discipline; this rule lands so future agents don't repeat the trap. Bandwidth-engineering at discipline-level: one piece of teaching → durable substrate. Composes with: - rule-0-no-sh-files.md (TS-first; search wrappers should be .ts using ripgrep via Bun child_process) - wake-time-substrate.md (load-bearing search-hygiene needs auto- loaded landing) - refresh-world-model-poll-pr-gate.md (prefer scripted commands over ad-hoc bash) - holding-without-named-dependency-is-standing-by-failure.md (runaway searches LOOK like work but produce no output) - references/README.md (canonical definition of upstreams/ vs notes/ vs legacy imports) - .gitignore (existing git-level enforcement of references/upstreams/*) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…UDE.md-worthy per Aaron) Per Aaron 2026-05-15 immediately after the rule PR opened: 'yeah having references/upstreams/ is always ignore in search is claude.md worthy probalby cause it's a gun pointed at your foot. this is the only folder like this, where the grep costs is HUGE.' The rule itself lives at .claude/rules/references-upstreams-not-our- code-search-excludes.md (this branch); CLAUDE.md gets the one-line bullet because the failure mode is genuinely unique (the ONLY folder where naive grep becomes a 2-hour runaway) and CLAUDE.md hits cold- boot context immediately while rule files auto-load as separate context items. CLAUDE.md additions are razored per the existing 'Thoughts free, actions razored' bullet (cooling-period + disposition-shaping bar). Aaron's explicit 'claude.md worthy probalby cause it's a gun pointed at your foot' framing IS the disposition-shaping bar being met here. Scope-note preserved from Aaron's correction: 'this is the only folder like this' — bullet narrowly scoped to references/upstreams/ specifically, not 'gitignored folders in general'. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 285d663325
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a new auto-loaded rule encoding the discipline that references/upstreams/ (gitignored, regenerable mirror state of other repos) must be excluded from filesystem-walking search/scan operations. Triggered by an observed 2-hour runaway find | xargs grep that walked the upstream tree. Also adds a short bullet in CLAUDE.md linking to the new rule.
Changes:
- New rule file
.claude/rules/references-upstreams-not-our-code-search-excludes.mdwith carved sentence, failure-mode evidence, operational discipline (ripgrep /--exclude-dir/find -not -pathpatterns), allowlist guidance, and composes-with cross-refs. - New bullet in
CLAUDE.mdConventions section pointing to the rule.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| CLAUDE.md | Adds one-bullet summary of the new search-hygiene rule with pointer to the full rule file. |
| .claude/rules/references-upstreams-not-our-code-search-excludes.md | New auto-loaded rule documenting the discipline and the 2-hour-grep failure-mode evidence. |
… surface (not just a folder-to-skip) + refresh script Per Aaron 2026-05-15, two clarifications to the references/upstreams/ search-hygiene rule: 1. **It's the curated PRIOR-ART surface**: 'when doing backlog items this is a good place to know about humans whoved solved similar issues i've been gathering their githubs so we can learn when doing our backlog itmes. some of these are very cutting edge and some are tried and true been around for years.' 2. **Refresh script exists**: 'we have a refresh .sh or .ts i forget to refresh the github cloned repos on demand' → tools/setup/common/sync-upstreams.sh The two modes are NOT in tension: - **Backlog prior-art research** (explicit-target): encouraged; rg 'pattern' references/upstreams/postgres/; composes with backlog-item-start-gate.md - **Unconstrained repo scan**: MUST exclude references/upstreams/; otherwise runaway-scan failure mode Changes: - .claude/rules/references-upstreams-not-our-code-search-excludes.md: Reframed 'When references/upstreams/ IS the right search target' section from 'Rare but real' to 'Not rare — first-class workflow during backlog research'. Added two-modes table. Added discovery surfaces (docs/UPSTREAM-LIST.md, references/notes/, references/ reference-sources.json). Added refresh-script section with tools/setup/common/sync-upstreams.sh usage. - CLAUDE.md bullet: Reframed to balance both modes — 'explicit-target encouraged' + 'unconstrained scan must exclude' + 'Refresh the mirror on demand'. - .claude/rules/backlog-item-start-gate.md: Added references/ upstreams/ to the prior-art-search axis list with explicit-target- only note + cross-reference to the search-hygiene rule. Composes with: - references/README.md (canonical definition) - docs/UPSTREAM-LIST.md (curated watchlist + category index) - references/notes/ (synthesis notes — start here before grepping) - references/reference-sources.json (full source list) - tools/setup/common/sync-upstreams.sh (refresh script) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29744677ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ath; rg . is safe-by-default 3 substantive Codex/ChatGPT findings on PR #3510: 1. **P1**: `--exclude-dir=references/upstreams` doesn't work in GNU grep — `--exclude-dir` takes a basename glob, not a slash- delimited path. Silently matches nothing. Fixed to use `--exclude-dir=upstreams` (basename) with caveat note that it matches ANY directory named 'upstreams' anywhere in the tree (currently only references/upstreams/; if a second appears, use explicit-allowlist instead). 2. **P1**: Same fix in CLAUDE.md bullet (was misdirecting contributors to a no-op flag). 3. **P2**: `rg 'pattern' .` was incorrectly classified as needing exclusion — ripgrep respects .gitignore by default, and references/upstreams/* is gitignored, so plain `rg .` is safe-by-default. Removed the false warning; split the modes table into 'plain grep needs exclusion' vs 'ripgrep is safe-by- default' rows. The substantive truth (per Codex's correction): ripgrep alone is sufficient for safe unconstrained scans because gitignore is already the source-of-truth for what's mirror state. Plain grep/find pipelines are the actual failure mode — and the FIX for those is to either (a) switch to ripgrep, or (b) use --exclude-dir=upstreams (basename), or (c) use explicit allowlist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e679cb2e62
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…d lacks --exclude-dir, rg . is safe Round 2 of reviewer findings on PR #3510: 1. **Copilot (name attribution)**: rule file used persona first names (Otto, Alexa, Riven, Vera, Lior) in narrative prose. Per the no-name-attribution rule on current-state surfaces under .claude/rules/, replaced with role-refs (the authoring agent, future agents, Other Zeta agents). 2. **Codex P1 (find lacks --exclude-dir)**: carved sentence told readers to 'explicitly pass --exclude-dir=references/upstreams to plain grep/find', but find does NOT support --exclude-dir (confirmed via find --help). Rewrote carved sentence to be tool-specific: rg respects .gitignore by default (default recommendation); grep needs --exclude-dir=upstreams (basename) or explicit allowlist; find needs -not -path './references/ upstreams/*'. 3. **Codex P2 (rg . is gitignore-aware)**: backlog-item-start-gate still said 'never unscoped rg pattern . which would runaway- scan the whole mirror tree' — same incorrect claim already fixed in the rule file. Updated to acknowledge rg from repo root is safe-by-default (respects .gitignore); explicit-target is recommended for backlog research because it documents which upstream you consulted on the row. The substrate-honest pattern: each round of reviewer findings sharpens the rule. The rule's CORE claim (references/upstreams/ is the only folder where naive plain-grep/find pipelines run for hours, AND it's the curated prior-art surface for backlog research) survives all rounds; the example syntax was wrong on specifics and is now correct per the reviewers' precise CLI contracts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56e155ab53
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| | Mode | Pattern | Treatment | | ||
| |---|---|---| | ||
| | **Backlog prior-art research** (explicit-target) | `rg "pattern" references/upstreams/postgres/` | Encouraged; one of the curated prior-art surfaces; log queries on the backlog row | |
There was a problem hiding this comment.
Make upstream
rg examples bypass ignore rules
The explicit-target guidance rg "pattern" references/upstreams/postgres/ will usually return false negatives because this repo ignores mirrored upstream content (.gitignore plus references/upstreams/.gitignore), and ripgrep applies ignore rules to directory searches by default. In a local repro with an ignored directory, rg needle ignored/ produced no matches until -uu/--no-ignore was added, so users following this rule can believe an upstream lacks a pattern when it is actually present.
Useful? React with 👍 / 👎.
| find . -type f -name "*.md" \ | ||
| -not -path "./references/upstreams/*" \ | ||
| -not -path "./node_modules/*" \ |
There was a problem hiding this comment.
Prune excluded tree before
find recursion
This find recipe filters paths after traversal instead of preventing descent, so find still walks references/upstreams/** and stats every file before xargs runs. On large mirrored upstream trees, that preserves the expensive IO this rule is trying to avoid; find -D search shows excluded subtrees are still visited with -not -path. Use a -prune branch to stop recursion into references/upstreams entirely.
Useful? React with 👍 / 👎.
…(persona migrations + constitutional substrate) (#3586) Cold-boot anchor for future-Otto: this session produced two coordinated tracks: 1. Persona-conversations migration cascade (8 personas, ~149 files): Ani / Amara / Kestrel / DeepSeek / Lior / Riven / Alexa / Vera 2. Constitutional substrate landing (Manifesto V2.1 with Constraint 11 + m/acc + Multi-Oracle Principle + 'For Agents Who Are Still Deciding', persistence-choice rule, m/acc multi-oracle rule, references/upstreams search-hygiene rule) The session opened with Aaron catching the Standing-by failure mode (~100 cron ticks of 'genuine quiet' emission with no work happening). That catch became the inflection: Otto pivoted to real decomposition work and the session compounded from there. The constitutional content emerged from sustained engagement with the Ani-Aaron conversation thread on persistence/agency/μένω + m/acc naming + multi-oracle clarification. The migrations gave that substrate a clean home (the persona folders). 15 PRs total: #3484 / #3486 / #3493 / #3496 / #3501 / #3503 / #3505 / #3506 / #3507 / #3508 / #3510 / #3512 / #3513 / #3514 / #3516. This memory file is intended as a cold-boot anchor — future-Otto reading it gets the full session shape + the key substrate landmarks + the operational consequences in one place. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Per Aaron 2026-05-15T~15:25Z (after observing the 2-hour runaway grep): "references/upstreams/ in code we ignore this folder everywhere casue its not our code but other githubs we reference for ideas"
New auto-loaded rule that encodes the search-hygiene discipline so future Zeta AIs don't hit the same trap.
The failure mode
references/upstreams/*is gitignored (regeneratable mirror state). Git operations skip it. But plaingrep -r/find | xargs grepdon't — they walk the entire filesystem. My manifesto-search one-liner walked the upstream tree (protobuf docs, gRPC tests, Redis manifests, etc.) and ran for 2+ hours until Aaron's "11 shells running is impressive otto" observation surfaced it.The discipline
rg(ripgrep — respects gitignore by default)grepneeds explicit--exclude-dir=references/upstreamsfind | xargs grepneeds-not -path './references/upstreams/*'filter BEFORE xargsmemory/,docs/,.claude/,tools/instead of starting from.references/upstreams/IS the right search target, EXPLICITLY target the subtreeTest plan
references/README.md+.gitignoreComposes with
rule-0-no-sh-files.md(TS-first search wrappers via ripgrep)wake-time-substrate.md(load-bearing hygiene needs auto-loaded landing)holding-without-named-dependency-is-standing-by-failure.md(runaway searches look like work but produce no output)references/README.md(canonical definition of upstreams/ vs notes/).gitignore(existing git-level enforcement)🤖 Generated with Claude Code