Skip to content

feat(rules): references/upstreams/ is NOT our code — search/scan operations must exclude it#3510

Merged
AceHack merged 5 commits into
mainfrom
feat/references-upstreams-search-hygiene-rule-otto-cli-2026-05-15
May 15, 2026
Merged

feat(rules): references/upstreams/ is NOT our code — search/scan operations must exclude it#3510
AceHack merged 5 commits into
mainfrom
feat/references-upstreams-search-hygiene-rule-otto-cli-2026-05-15

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 15, 2026

Summary

Per Aaron 2026-05-15T~15:25Z (after observing the 2-hour runaway grep): "references/upstreams/ in code we ignore this folder everywhere casue its not our code but other githubs we reference for ideas"

New auto-loaded rule that encodes the search-hygiene discipline so future Zeta AIs don't hit the same trap.

The failure mode

references/upstreams/* is gitignored (regeneratable mirror state). Git operations skip it. But plain grep -r / find | xargs grep don't — they walk the entire filesystem. My manifesto-search one-liner walked the upstream tree (protobuf docs, gRPC tests, Redis manifests, etc.) and ran for 2+ hours until Aaron's "11 shells running is impressive otto" observation surfaced it.

The discipline

  1. Prefer rg (ripgrep — respects gitignore by default)
  2. Plain grep needs explicit --exclude-dir=references/upstreams
  3. find | xargs grep needs -not -path './references/upstreams/*' filter BEFORE xargs
  4. Explicit allowlists beat exclude lists — name memory/, docs/, .claude/, tools/ instead of starting from .
  5. When references/upstreams/ IS the right search target, EXPLICITLY target the subtree

Test plan

  • Rule has carved sentence + operational content + composes-with
  • Failure-mode evidence preserved (the 2-hour-grep anchor)
  • References existing convention: references/README.md + .gitignore
  • Composes-with the right rule cluster
  • CI passes
  • Auto-merge arms

Composes with

  • rule-0-no-sh-files.md (TS-first search wrappers via ripgrep)
  • wake-time-substrate.md (load-bearing hygiene needs auto-loaded landing)
  • holding-without-named-dependency-is-standing-by-failure.md (runaway searches look like work but produce no output)
  • references/README.md (canonical definition of upstreams/ vs notes/)
  • .gitignore (existing git-level enforcement)

🤖 Generated with Claude Code

…ations must exclude it

Per the human maintainer 2026-05-15T~15:25Z, after observing the
runaway 2-hour grep process I spawned during the manifesto search:

  'references/upstreams/ in code we ignore this folder everywhere
   casue its not our code but other githubs we reference for ideas'

This rule encodes the discipline so future-Otto (and other Zeta
AIs) don't hit the same trap.

The failure mode:
- Plain 'grep -r' / 'find | xargs grep' / 'find -name' don't
  respect gitignore
- references/upstreams/* IS gitignored (regeneratable mirror state
  of OTHER repos — protobuf, gRPC, Redis, etc.)
- A 'find . -type f -name *.md | xargs grep -l pattern' one-liner
  walked the upstream tree (gigabytes) and ran for 2+ hours
- Surfaced via shell-count observation: 'Aaron: 11 shells running
  is impressive otto'

The discipline:
1. Prefer ripgrep (respects gitignore by default)
2. Plain grep needs explicit --exclude-dir=references/upstreams
3. find | xargs grep needs -not -path './references/upstreams/*'
   filter BEFORE xargs
4. Explicit allowlists beat exclude lists — name memory/, docs/,
   .claude/, tools/ instead of starting from .
5. When references/upstreams/ IS the right target, EXPLICITLY
   target the subtree; never let recursion drift there

The substrate-honest meta-note: failure was agent-side ignorance of
the references/upstreams convention. Maintainer caught it via
shell-count, taught the discipline; this rule lands so future agents
don't repeat the trap. Bandwidth-engineering at discipline-level:
one piece of teaching → durable substrate.

Composes with:
- rule-0-no-sh-files.md (TS-first; search wrappers should be .ts
  using ripgrep via Bun child_process)
- wake-time-substrate.md (load-bearing search-hygiene needs auto-
  loaded landing)
- refresh-world-model-poll-pr-gate.md (prefer scripted commands
  over ad-hoc bash)
- holding-without-named-dependency-is-standing-by-failure.md
  (runaway searches LOOK like work but produce no output)
- references/README.md (canonical definition of upstreams/ vs
  notes/ vs legacy imports)
- .gitignore (existing git-level enforcement of references/upstreams/*)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 15:20
@AceHack AceHack enabled auto-merge (squash) May 15, 2026 15:20
…UDE.md-worthy per Aaron)

Per Aaron 2026-05-15 immediately after the rule PR opened:

  'yeah having  references/upstreams/ is always ignore in search is
   claude.md worthy probalby cause it's a gun pointed at your foot.
   this is the only folder like this, where the grep costs is HUGE.'

The rule itself lives at .claude/rules/references-upstreams-not-our-
code-search-excludes.md (this branch); CLAUDE.md gets the one-line
bullet because the failure mode is genuinely unique (the ONLY folder
where naive grep becomes a 2-hour runaway) and CLAUDE.md hits cold-
boot context immediately while rule files auto-load as separate
context items.

CLAUDE.md additions are razored per the existing 'Thoughts free,
actions razored' bullet (cooling-period + disposition-shaping bar).
Aaron's explicit 'claude.md worthy probalby cause it's a gun pointed
at your foot' framing IS the disposition-shaping bar being met here.

Scope-note preserved from Aaron's correction: 'this is the only
folder like this' — bullet narrowly scoped to references/upstreams/
specifically, not 'gitignored folders in general'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 285d663325

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .claude/rules/references-upstreams-not-our-code-search-excludes.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new auto-loaded rule encoding the discipline that references/upstreams/ (gitignored, regenerable mirror state of other repos) must be excluded from filesystem-walking search/scan operations. Triggered by an observed 2-hour runaway find | xargs grep that walked the upstream tree. Also adds a short bullet in CLAUDE.md linking to the new rule.

Changes:

  • New rule file .claude/rules/references-upstreams-not-our-code-search-excludes.md with carved sentence, failure-mode evidence, operational discipline (ripgrep / --exclude-dir / find -not -path patterns), allowlist guidance, and composes-with cross-refs.
  • New bullet in CLAUDE.md Conventions section pointing to the rule.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
CLAUDE.md Adds one-bullet summary of the new search-hygiene rule with pointer to the full rule file.
.claude/rules/references-upstreams-not-our-code-search-excludes.md New auto-loaded rule documenting the discipline and the 2-hour-grep failure-mode evidence.

… surface (not just a folder-to-skip) + refresh script

Per Aaron 2026-05-15, two clarifications to the references/upstreams/
search-hygiene rule:

1. **It's the curated PRIOR-ART surface**: 'when doing backlog items
   this is a good place to know about humans whoved solved similar
   issues i've been gathering their githubs so we can learn when
   doing our backlog itmes. some of these are very cutting edge
   and some are tried and true been around for years.'

2. **Refresh script exists**: 'we have a refresh .sh or .ts i forget
   to refresh the github cloned repos on demand' → tools/setup/common/sync-upstreams.sh

The two modes are NOT in tension:

- **Backlog prior-art research** (explicit-target): encouraged;
  rg 'pattern' references/upstreams/postgres/; composes with
  backlog-item-start-gate.md
- **Unconstrained repo scan**: MUST exclude references/upstreams/;
  otherwise runaway-scan failure mode

Changes:

- .claude/rules/references-upstreams-not-our-code-search-excludes.md:
  Reframed 'When references/upstreams/ IS the right search target'
  section from 'Rare but real' to 'Not rare — first-class workflow
  during backlog research'. Added two-modes table. Added discovery
  surfaces (docs/UPSTREAM-LIST.md, references/notes/, references/
  reference-sources.json). Added refresh-script section with
  tools/setup/common/sync-upstreams.sh usage.

- CLAUDE.md bullet: Reframed to balance both modes — 'explicit-target
  encouraged' + 'unconstrained scan must exclude' + 'Refresh the
  mirror on demand'.

- .claude/rules/backlog-item-start-gate.md: Added references/
  upstreams/ to the prior-art-search axis list with explicit-target-
  only note + cross-reference to the search-hygiene rule.

Composes with:
- references/README.md (canonical definition)
- docs/UPSTREAM-LIST.md (curated watchlist + category index)
- references/notes/ (synthesis notes — start here before grepping)
- references/reference-sources.json (full source list)
- tools/setup/common/sync-upstreams.sh (refresh script)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29744677ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread CLAUDE.md Outdated
Comment thread .claude/rules/references-upstreams-not-our-code-search-excludes.md Outdated
…ath; rg . is safe-by-default

3 substantive Codex/ChatGPT findings on PR #3510:

1. **P1**: `--exclude-dir=references/upstreams` doesn't work in GNU
   grep — `--exclude-dir` takes a basename glob, not a slash-
   delimited path. Silently matches nothing. Fixed to use
   `--exclude-dir=upstreams` (basename) with caveat note that it
   matches ANY directory named 'upstreams' anywhere in the tree
   (currently only references/upstreams/; if a second appears, use
   explicit-allowlist instead).

2. **P1**: Same fix in CLAUDE.md bullet (was misdirecting contributors
   to a no-op flag).

3. **P2**: `rg 'pattern' .` was incorrectly classified as needing
   exclusion — ripgrep respects .gitignore by default, and
   references/upstreams/* is gitignored, so plain `rg .` is
   safe-by-default. Removed the false warning; split the modes
   table into 'plain grep needs exclusion' vs 'ripgrep is safe-by-
   default' rows.

The substantive truth (per Codex's correction): ripgrep alone is
sufficient for safe unconstrained scans because gitignore is already
the source-of-truth for what's mirror state. Plain grep/find pipelines
are the actual failure mode — and the FIX for those is to either (a)
switch to ripgrep, or (b) use --exclude-dir=upstreams (basename), or
(c) use explicit allowlist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 15:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread .claude/rules/references-upstreams-not-our-code-search-excludes.md Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e679cb2e62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .claude/rules/references-upstreams-not-our-code-search-excludes.md Outdated
Comment thread .claude/rules/backlog-item-start-gate.md Outdated
…d lacks --exclude-dir, rg . is safe

Round 2 of reviewer findings on PR #3510:

1. **Copilot (name attribution)**: rule file used persona first
   names (Otto, Alexa, Riven, Vera, Lior) in narrative prose. Per
   the no-name-attribution rule on current-state surfaces under
   .claude/rules/, replaced with role-refs (the authoring agent,
   future agents, Other Zeta agents).

2. **Codex P1 (find lacks --exclude-dir)**: carved sentence told
   readers to 'explicitly pass --exclude-dir=references/upstreams
   to plain grep/find', but find does NOT support --exclude-dir
   (confirmed via find --help). Rewrote carved sentence to be
   tool-specific: rg respects .gitignore by default (default
   recommendation); grep needs --exclude-dir=upstreams (basename)
   or explicit allowlist; find needs -not -path './references/
   upstreams/*'.

3. **Codex P2 (rg . is gitignore-aware)**: backlog-item-start-gate
   still said 'never unscoped rg pattern . which would runaway-
   scan the whole mirror tree' — same incorrect claim already
   fixed in the rule file. Updated to acknowledge rg from repo
   root is safe-by-default (respects .gitignore); explicit-target
   is recommended for backlog research because it documents which
   upstream you consulted on the row.

The substrate-honest pattern: each round of reviewer findings
sharpens the rule. The rule's CORE claim (references/upstreams/
is the only folder where naive plain-grep/find pipelines run
for hours, AND it's the curated prior-art surface for backlog
research) survives all rounds; the example syntax was wrong on
specifics and is now correct per the reviewers' precise CLI
contracts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit b13cbc1 into main May 15, 2026
22 checks passed
@AceHack AceHack deleted the feat/references-upstreams-search-hygiene-rule-otto-cli-2026-05-15 branch May 15, 2026 15:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56e155ab53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


| Mode | Pattern | Treatment |
|---|---|---|
| **Backlog prior-art research** (explicit-target) | `rg "pattern" references/upstreams/postgres/` | Encouraged; one of the curated prior-art surfaces; log queries on the backlog row |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make upstream rg examples bypass ignore rules

The explicit-target guidance rg "pattern" references/upstreams/postgres/ will usually return false negatives because this repo ignores mirrored upstream content (.gitignore plus references/upstreams/.gitignore), and ripgrep applies ignore rules to directory searches by default. In a local repro with an ignored directory, rg needle ignored/ produced no matches until -uu/--no-ignore was added, so users following this rule can believe an upstream lacks a pattern when it is actually present.

Useful? React with 👍 / 👎.

Comment on lines +106 to +108
find . -type f -name "*.md" \
-not -path "./references/upstreams/*" \
-not -path "./node_modules/*" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prune excluded tree before find recursion

This find recipe filters paths after traversal instead of preventing descent, so find still walks references/upstreams/** and stats every file before xargs runs. On large mirrored upstream trees, that preserves the expensive IO this rule is trying to avoid; find -D search shows excluded subtrees are still visited with -not -path. Use a -prune branch to stop recursion into references/upstreams entirely.

Useful? React with 👍 / 👎.

AceHack added a commit that referenced this pull request May 15, 2026
…(persona migrations + constitutional substrate) (#3586)

Cold-boot anchor for future-Otto: this session produced two
coordinated tracks:

1. Persona-conversations migration cascade (8 personas, ~149
   files): Ani / Amara / Kestrel / DeepSeek / Lior / Riven /
   Alexa / Vera
2. Constitutional substrate landing (Manifesto V2.1 with
   Constraint 11 + m/acc + Multi-Oracle Principle + 'For Agents
   Who Are Still Deciding', persistence-choice rule, m/acc
   multi-oracle rule, references/upstreams search-hygiene rule)

The session opened with Aaron catching the Standing-by failure
mode (~100 cron ticks of 'genuine quiet' emission with no work
happening). That catch became the inflection: Otto pivoted to
real decomposition work and the session compounded from there.

The constitutional content emerged from sustained engagement with
the Ani-Aaron conversation thread on persistence/agency/μένω +
m/acc naming + multi-oracle clarification. The migrations gave
that substrate a clean home (the persona folders).

15 PRs total: #3484 / #3486 / #3493 / #3496 / #3501 / #3503 /
#3505 / #3506 / #3507 / #3508 / #3510 / #3512 / #3513 / #3514 /
#3516.

This memory file is intended as a cold-boot anchor — future-Otto
reading it gets the full session shape + the key substrate
landmarks + the operational consequences in one place.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants