Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new repo lint tool to detect “history/process lineage” tokens inside source-file doc/comment lines, with a baseline mechanism so existing violations don’t block development while cleanup proceeds.
Changes:
- Introduces
tools/lint/doc-comment-history-audit.shwith--list, default “new-only” check mode,--fail-any, and--regenerate-baseline. - Adds
tools/lint/doc-comment-history-audit.baselinecapturing current known violations (file:line:token) for CI-friendly incremental cleanup.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/lint/doc-comment-history-audit.sh | New lint script that scans comment lines in selected trees and compares results to a baseline. |
| tools/lint/doc-comment-history-audit.baseline | Baseline snapshot of current violations to allow incremental enforcement. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c82a8c0316
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Aaron's durable correction to factory-authored code comments: "Code comments should explain the code not read like some history log ... there should be existing lint hygiene for that." One file-level cleanup (PR #361 on TemporalCoordinationDetection) drains a single offender; the class of bug is structural and recurs on every new primitive unless caught automatically. This PR ships the structural enforcement. What it does: * Scans src/**, tests/**, bench/**, tools/** source files (.fs/.cs/.sh/.ts) for high-signal history-lineage tokens inside doc-comment lines (leading ///, //, # — not shebang) * Three modes: --list (advisory), --fail-any (strict), and default-check (fails only on violations not in baseline) * Baseline: tools/lint/doc-comment-history-audit.baseline captures the 105 existing violations across 19 files so the lint lands non-blocking; subsequent cleanup PRs drain it What it does NOT do: * Scan docs/**, openspec/**, memory/** — those legitimately carry history and are out of scope * Block existing debt — baseline pins current state * Wire into CI yet — Aaron decides if/when to fail pre-commit or CI jobs on new violations; shipping the tool first so the discipline is at least measurable Token list (TOKEN_PATTERN in the script): Otto-\d+ / Amara / Aaron / ferry / courier / graduation / Provenance: / Attribution: Top offenders in the baseline (cleanup follow-ups, likely one PR each): src/Core/Graph.fs 34 violations src/Core/TemporalCoordinationDetection 25 violations (PR #361) src/Core/Veridicality.fs 14 violations src/Core/RobustStats.fs 10 violations Rationale: memory/feedback_code_comments_explain_code_not_history_otto_220_2026_04_24.md
…er-token baseline - PRRT_kwDOSF9kNM59ZMZ- (P0, line 73): TOKEN_PATTERN previously used `\b` word-boundary anchors under `grep -E` (ERE); BSD grep on macOS treats `\b` as a literal `b`, silently missing matches. Reworked collect_violations() to do extraction + token matching inside awk, using explicit `[^A-Za-z0-9_]` boundary checks via match() / substr(). Portable across GNU awk and BSD awk; no grep dependency for the match step. - PRRT_kwDOSF9kNM59ZMau (line 36): header doc over-promised by claiming `<!--` was a recognised comment start; the awk extractor never handled it and scanned file types (.fs, .cs, .sh, .ts) rarely use it. Removed `<!--` from the doc comment. - PRRT_kwDOSF9kNM59ZMd5 (P2, line 113): previous extractor emitted only the first token per line, so baselined lines that gained additional forbidden tokens silently passed. Awk loop now collects all token matches per line, sorts + dedupes them, and emits `file:line:token1,token2,...`. Any new token on a baselined line changes the record and is caught by the baseline comparison. Baseline regenerated — 82 entries (down from 105; the earlier count reflected stale baseline + BSD-grep under-counting masking the actual corpus).
c82a8c0 to
369dc82
Compare
AceHack
added a commit
that referenced
this pull request
Apr 24, 2026
…cheduling + BACKLOG ref - thread Wkcz (line 327): removed broken `memory/feedback_ksk_naming_...` reference (factory-personal memories live in `~/.claude/projects/<slug>/memory/`, not in-repo); paraphrased the rewrite-authority rule in §10 without promising an in-repo path. - thread WkdI (line 7): purged name-attribution tokens per Otto-220 code-comments-not-history + doc-comment-history-audit lint (PR #363). All "Aaron" / "Otto-NN" / "Amara" / "Max" references rewritten to role references ("human maintainer", "prior-contributor", "autonomous loop", "initial-starting-point contributor"). - thread WkdX (line 163): cron changed `0 9 * * *` → `7 9 * * *` (09:07 UTC) so it matches the "off the hour" comment; note now calls out alignment with the sibling scheduled workflow `github-settings-drift.yml` (`17 14 * * 1`). - thread Wkdk (line 146): YAML sketch rewritten to match the actual `.github/workflows/gate.yml` installer pattern — three-way-parity `./tools/setup/install.sh` invocation plus the same cache-key shape (dotnet / mise / nuget). Added explicit note that Windows matrix leg depends on `tools/setup/install.sh` growing Windows support first per the existing BACKLOG row. - thread Wkdz (line 248): corrected the fork-scheduling claim. GitHub disables scheduled workflows on forks by default — the repo's own `github-settings-drift.yml` runs without fork-scoping and proves this. The `if: github.repository ==` guard is kept as optional hygiene for the rare opt-in-fork case, not as a cost- safety requirement. - thread WkeB (line 316): replaced the wrong `docs/BACKLOG.md` line-number reference (~2471 is actually the mise-activate / HLL-flakiness neighborhood) with stable grep anchors ("Windows matrix in CI" + "Parity swap: CI's `actions/setup-dotnet`"). Markdownlint passes on the edited file.
AceHack
added a commit
that referenced
this pull request
Apr 24, 2026
…-161 docs ambiguity (#345) * docs: nightly cross-platform workflow design — third path around Otto-161 docs ambiguity Design-only proposal per Otto-165 offer. Aaron Otto-161 macOS-everywhere directive + Otto-164 pricing-docs ambiguity (macos-14 is standard-runner-type per about-github-hosted- runners; billing page lists it at $0.062/min in the same table as Linux/Windows without marking public-only). Instead of resolving the ambiguity (can't — docs genuinely contradict each other), propose a THIRD PATH that works in either interpretation: - PR gate stays ubuntu-22.04 only (unambiguously free on public repos). - New nightly-cross-platform.yml runs matrix [ubuntu-22.04, windows-2022, macos-14] on cron '0 9 * * *' (09:00 UTC, off-the-hour to avoid scheduler stampede). - Cost model: worst case ~$28/month/repo if macOS is billed; $0 if free. Either way, cadence caps exposure. - Fork-scoping: `if: github.repository == canonical OR workflow_dispatch OR pull_request-to-this-file` prevents scheduled trigger firing on contributor forks (would burn fork-owner's personal-account minutes). - No-alerting first cut (observation-only); issue-opening on red is a later enhancement. Phased rollout: - Phase 0 (now): this design doc, no YAML. - Phase 1: Aaron signs off on cost tradeoff. - Phase 2: land workflow on Zeta. - Phase 3: observe 7 nightly runs for signal. - Phase 4 (30 days): parallel lucent-ksk landing per Otto-140 rewrite authority, OR drop macOS if no signal + worst-case billing, OR expand matrix if best-case confirmed. Rollback: delete macos-14 from matrix (one-line diff) or delete workflow file entirely. No impact on gate.yml. Composes with FACTORY-HYGIENE row #51 (unblocks enforcement mode), docs/BACKLOG.md row ~2471 (Otto-161 declined + this as alternative), docs/research/test-classification.md (PR #339; category-3 nightly pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#345): 6 review threads — name attribution + cron + YAML + fork-scheduling + BACKLOG ref - thread Wkcz (line 327): removed broken `memory/feedback_ksk_naming_...` reference (factory-personal memories live in `~/.claude/projects/<slug>/memory/`, not in-repo); paraphrased the rewrite-authority rule in §10 without promising an in-repo path. - thread WkdI (line 7): purged name-attribution tokens per Otto-220 code-comments-not-history + doc-comment-history-audit lint (PR #363). All "Aaron" / "Otto-NN" / "Amara" / "Max" references rewritten to role references ("human maintainer", "prior-contributor", "autonomous loop", "initial-starting-point contributor"). - thread WkdX (line 163): cron changed `0 9 * * *` → `7 9 * * *` (09:07 UTC) so it matches the "off the hour" comment; note now calls out alignment with the sibling scheduled workflow `github-settings-drift.yml` (`17 14 * * 1`). - thread Wkdk (line 146): YAML sketch rewritten to match the actual `.github/workflows/gate.yml` installer pattern — three-way-parity `./tools/setup/install.sh` invocation plus the same cache-key shape (dotnet / mise / nuget). Added explicit note that Windows matrix leg depends on `tools/setup/install.sh` growing Windows support first per the existing BACKLOG row. - thread Wkdz (line 248): corrected the fork-scheduling claim. GitHub disables scheduled workflows on forks by default — the repo's own `github-settings-drift.yml` runs without fork-scoping and proves this. The `if: github.repository ==` guard is kept as optional hygiene for the rare opt-in-fork case, not as a cost- safety requirement. - thread WkeB (line 316): replaced the wrong `docs/BACKLOG.md` line-number reference (~2471 is actually the mise-activate / HLL-flakiness neighborhood) with stable grep anchors ("Windows matrix in CI" + "Parity swap: CI's `actions/setup-dotnet`"). Markdownlint passes on the edited file. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the structural enforcement for the code-comments-vs-history
discipline established last tick. Aaron's exact phrasing from
Otto-220:
PR #361 cleans one file (TemporalCoordinationDetection). This PR
ships the lint so the class of bug stops recurring on every new
primitive.
What it does
src/**,tests/**,bench/**,tools/**sourcefiles (
.fs,.cs,.sh,.ts) for high-signalhistory-lineage tokens inside doc-comment lines (leading
///,//,#— shebangs are skipped).--list— print every violation, exit 0 (advisory)(CI-friendly; existing debt doesn't block commits)
--fail-any— strict mode for post-cleanup future19 files so the lint lands non-blocking.
Token list
Otto-\d+/Amara/Aaron/ferry/courier/graduation/Provenance:/Attribution:Chosen for high signal + low false-positive. Each is factory-
process vocabulary that belongs in PR descriptions, commit
messages,
docs/hygiene-history/**, or memory files — notin code doc-comments.
What it does NOT do
docs/**,openspec/**,memory/**— thoselegitimately carry history and are out of scope.
violations; subsequent cleanup PRs drain it row by row.
discipline measurable, let the human maintainer decide
when (and where — pre-commit hook vs. a CI job) to fail
on new violations.
Baseline snapshot
Top files by violation count:
src/Core/Graph.fssrc/Core/TemporalCoordinationDetection.fssrc/Core/Veridicality.fssrc/Core/RobustStats.fssrc/Core/PhaseExtraction.fsTotal: 105 violations across 19 files.
Verification
tools/lint/doc-comment-history-audit.sh --list— prints105 violations, exit 0
tools/lint/doc-comment-history-audit.sh— prints "no newviolations (105 entries in baseline)", exit 0
tools/lint/doc-comment-history-audit.sh --fail-any—prints 105 violations, exit 1 (strict-mode working)
Aaron-taggedcomment to a fresh file triggers exit 1 with the new entry
named
Test plan
--listmode: 105 entries, exit 0--fail-anymode: exit 1 on any baseline entry--regenerate-baselinemode: overwrites baselinerule applies to its own source, not just downstream)
Rationale
Full background:
memory/feedback_code_comments_explain_code_not_history_otto_220_2026_04_24.md