Skip to content

tools: doc-comment history-audit lint — structural enforcement for Otto-220#363

Merged
AceHack merged 2 commits intomainfrom
tools/doc-comment-history-audit-lint-otto-222
Apr 24, 2026
Merged

tools: doc-comment history-audit lint — structural enforcement for Otto-220#363
AceHack merged 2 commits intomainfrom
tools/doc-comment-history-audit-lint-otto-222

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 24, 2026

Summary

Ships the structural enforcement for the code-comments-vs-history
discipline established last tick. Aaron's exact phrasing from
Otto-220:

"Code comments should explain the code not read like some
history log, we have lint, everything should read as up to
date current except for history type files. code is not a
history file. ... there should be existing lint hygiene for
that."

PR #361 cleans one file (TemporalCoordinationDetection). This PR
ships the lint so the class of bug stops recurring on every new
primitive.

What it does

  • Scans src/**, tests/**, bench/**, tools/** source
    files (.fs, .cs, .sh, .ts) for high-signal
    history-lineage tokens inside doc-comment lines (leading
    ///, //, # — shebangs are skipped).
  • Three modes:
    • --list — print every violation, exit 0 (advisory)
    • default-check — fail only on violations NOT in baseline
      (CI-friendly; existing debt doesn't block commits)
    • --fail-any — strict mode for post-cleanup future
  • Baseline file captures the 105 existing violations across
    19 files so the lint lands non-blocking.

Token list

Otto-\d+ / Amara / Aaron / ferry / courier /
graduation / Provenance: / Attribution:

Chosen for high signal + low false-positive. Each is factory-
process vocabulary that belongs in PR descriptions, commit
messages, docs/hygiene-history/**, or memory files — not
in code doc-comments.

What it does NOT do

  • Does not scan docs/**, openspec/**, memory/** — those
    legitimately carry history and are out of scope.
  • Does not block existing debt. Baseline pins 105 current
    violations; subsequent cleanup PRs drain it row by row.
  • Does not wire into CI yet. Ship the tool first, make the
    discipline measurable, let the human maintainer decide
    when (and where — pre-commit hook vs. a CI job) to fail
    on new violations.

Baseline snapshot

Top files by violation count:

File Violations
src/Core/Graph.fs 34
src/Core/TemporalCoordinationDetection.fs 25 (addressed by #361)
src/Core/Veridicality.fs 14
src/Core/RobustStats.fs 10
src/Core/PhaseExtraction.fs 3
14 other files 19 total

Total: 105 violations across 19 files.

Verification

  • tools/lint/doc-comment-history-audit.sh --list — prints
    105 violations, exit 0
  • tools/lint/doc-comment-history-audit.sh — prints "no new
    violations (105 entries in baseline)", exit 0
  • tools/lint/doc-comment-history-audit.sh --fail-any
    prints 105 violations, exit 1 (strict-mode working)
  • Synthetic new-violation test: adding one Aaron-tagged
    comment to a fresh file triggers exit 1 with the new entry
    named

Test plan

  • --list mode: 105 entries, exit 0
  • default check with clean tree: exit 0
  • default check with synthetic new violation: exit 1
  • --fail-any mode: exit 1 on any baseline entry
  • --regenerate-baseline mode: overwrites baseline
  • Self-references in the lint's own file stripped (the
    rule applies to its own source, not just downstream)
  • ShellCheck clean (manual review)

Rationale

Full background: memory/feedback_code_comments_explain_code_not_history_otto_220_2026_04_24.md

Copilot AI review requested due to automatic review settings April 24, 2026 12:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new repo lint tool to detect “history/process lineage” tokens inside source-file doc/comment lines, with a baseline mechanism so existing violations don’t block development while cleanup proceeds.

Changes:

  • Introduces tools/lint/doc-comment-history-audit.sh with --list, default “new-only” check mode, --fail-any, and --regenerate-baseline.
  • Adds tools/lint/doc-comment-history-audit.baseline capturing current known violations (file:line:token) for CI-friendly incremental cleanup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
tools/lint/doc-comment-history-audit.sh New lint script that scans comment lines in selected trees and compares results to a baseline.
tools/lint/doc-comment-history-audit.baseline Baseline snapshot of current violations to allow incremental enforcement.

Comment thread tools/lint/doc-comment-history-audit.sh Outdated
Comment thread tools/lint/doc-comment-history-audit.sh Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c82a8c0316

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/lint/doc-comment-history-audit.sh Outdated
@AceHack AceHack enabled auto-merge (squash) April 24, 2026 12:37
AceHack added 2 commits April 24, 2026 08:52
Aaron's durable correction to factory-authored code comments:
"Code comments should explain the code not read like some
history log ... there should be existing lint hygiene for that."

One file-level cleanup (PR #361 on TemporalCoordinationDetection)
drains a single offender; the class of bug is structural and
recurs on every new primitive unless caught automatically. This
PR ships the structural enforcement.

What it does:
  * Scans src/**, tests/**, bench/**, tools/** source files
    (.fs/.cs/.sh/.ts) for high-signal history-lineage tokens
    inside doc-comment lines (leading ///, //, # — not shebang)
  * Three modes: --list (advisory), --fail-any (strict), and
    default-check (fails only on violations not in baseline)
  * Baseline: tools/lint/doc-comment-history-audit.baseline
    captures the 105 existing violations across 19 files so the
    lint lands non-blocking; subsequent cleanup PRs drain it

What it does NOT do:
  * Scan docs/**, openspec/**, memory/** — those legitimately
    carry history and are out of scope
  * Block existing debt — baseline pins current state
  * Wire into CI yet — Aaron decides if/when to fail pre-commit
    or CI jobs on new violations; shipping the tool first so the
    discipline is at least measurable

Token list (TOKEN_PATTERN in the script):
  Otto-\d+  /  Amara  /  Aaron  /  ferry  /  courier  /
  graduation  /  Provenance:  /  Attribution:

Top offenders in the baseline (cleanup follow-ups, likely one
PR each):
  src/Core/Graph.fs                      34 violations
  src/Core/TemporalCoordinationDetection 25 violations (PR #361)
  src/Core/Veridicality.fs               14 violations
  src/Core/RobustStats.fs                10 violations

Rationale: memory/feedback_code_comments_explain_code_not_history_otto_220_2026_04_24.md
…er-token baseline

- PRRT_kwDOSF9kNM59ZMZ- (P0, line 73): TOKEN_PATTERN previously used
  `\b` word-boundary anchors under `grep -E` (ERE); BSD grep on macOS
  treats `\b` as a literal `b`, silently missing matches. Reworked
  collect_violations() to do extraction + token matching inside awk,
  using explicit `[^A-Za-z0-9_]` boundary checks via match() /
  substr(). Portable across GNU awk and BSD awk; no grep dependency
  for the match step.
- PRRT_kwDOSF9kNM59ZMau (line 36): header doc over-promised by
  claiming `<!--` was a recognised comment start; the awk extractor
  never handled it and scanned file types (.fs, .cs, .sh, .ts)
  rarely use it. Removed `<!--` from the doc comment.
- PRRT_kwDOSF9kNM59ZMd5 (P2, line 113): previous extractor emitted
  only the first token per line, so baselined lines that gained
  additional forbidden tokens silently passed. Awk loop now
  collects all token matches per line, sorts + dedupes them, and
  emits `file:line:token1,token2,...`. Any new token on a baselined
  line changes the record and is caught by the baseline comparison.

Baseline regenerated — 82 entries (down from 105; the earlier count
reflected stale baseline + BSD-grep under-counting masking the
actual corpus).
@AceHack AceHack force-pushed the tools/doc-comment-history-audit-lint-otto-222 branch from c82a8c0 to 369dc82 Compare April 24, 2026 12:54
@AceHack AceHack merged commit 9bf22b8 into main Apr 24, 2026
10 checks passed
@AceHack AceHack deleted the tools/doc-comment-history-audit-lint-otto-222 branch April 24, 2026 12:56
AceHack added a commit that referenced this pull request Apr 24, 2026
…cheduling + BACKLOG ref

- thread Wkcz (line 327): removed broken `memory/feedback_ksk_naming_...`
  reference (factory-personal memories live in `~/.claude/projects/<slug>/memory/`,
  not in-repo); paraphrased the rewrite-authority rule in §10 without
  promising an in-repo path.

- thread WkdI (line 7): purged name-attribution tokens per Otto-220
  code-comments-not-history + doc-comment-history-audit lint
  (PR #363). All "Aaron" / "Otto-NN" / "Amara" / "Max" references
  rewritten to role references ("human maintainer", "prior-contributor",
  "autonomous loop", "initial-starting-point contributor").

- thread WkdX (line 163): cron changed `0 9 * * *` → `7 9 * * *`
  (09:07 UTC) so it matches the "off the hour" comment; note now
  calls out alignment with the sibling scheduled workflow
  `github-settings-drift.yml` (`17 14 * * 1`).

- thread Wkdk (line 146): YAML sketch rewritten to match the actual
  `.github/workflows/gate.yml` installer pattern — three-way-parity
  `./tools/setup/install.sh` invocation plus the same cache-key
  shape (dotnet / mise / nuget). Added explicit note that Windows
  matrix leg depends on `tools/setup/install.sh` growing Windows
  support first per the existing BACKLOG row.

- thread Wkdz (line 248): corrected the fork-scheduling claim. GitHub
  disables scheduled workflows on forks by default — the repo's
  own `github-settings-drift.yml` runs without fork-scoping and
  proves this. The `if: github.repository ==` guard is kept as
  optional hygiene for the rare opt-in-fork case, not as a cost-
  safety requirement.

- thread WkeB (line 316): replaced the wrong `docs/BACKLOG.md`
  line-number reference (~2471 is actually the mise-activate
  / HLL-flakiness neighborhood) with stable grep anchors
  ("Windows matrix in CI" + "Parity swap: CI's `actions/setup-dotnet`").

Markdownlint passes on the edited file.
AceHack added a commit that referenced this pull request Apr 24, 2026
…-161 docs ambiguity (#345)

* docs: nightly cross-platform workflow design — third path around Otto-161 docs ambiguity

Design-only proposal per Otto-165 offer. Aaron Otto-161
macOS-everywhere directive + Otto-164 pricing-docs ambiguity
(macos-14 is standard-runner-type per about-github-hosted-
runners; billing page lists it at $0.062/min in the same
table as Linux/Windows without marking public-only).

Instead of resolving the ambiguity (can't — docs genuinely
contradict each other), propose a THIRD PATH that works in
either interpretation:

- PR gate stays ubuntu-22.04 only (unambiguously free on
  public repos).
- New nightly-cross-platform.yml runs matrix [ubuntu-22.04,
  windows-2022, macos-14] on cron '0 9 * * *' (09:00 UTC,
  off-the-hour to avoid scheduler stampede).
- Cost model: worst case ~$28/month/repo if macOS is billed;
  $0 if free. Either way, cadence caps exposure.
- Fork-scoping: `if: github.repository == canonical OR
  workflow_dispatch OR pull_request-to-this-file` prevents
  scheduled trigger firing on contributor forks (would burn
  fork-owner's personal-account minutes).
- No-alerting first cut (observation-only); issue-opening
  on red is a later enhancement.

Phased rollout:
- Phase 0 (now): this design doc, no YAML.
- Phase 1: Aaron signs off on cost tradeoff.
- Phase 2: land workflow on Zeta.
- Phase 3: observe 7 nightly runs for signal.
- Phase 4 (30 days): parallel lucent-ksk landing per
  Otto-140 rewrite authority, OR drop macOS if no signal +
  worst-case billing, OR expand matrix if best-case
  confirmed.

Rollback: delete macos-14 from matrix (one-line diff) or
delete workflow file entirely. No impact on gate.yml.

Composes with FACTORY-HYGIENE row #51 (unblocks enforcement
mode), docs/BACKLOG.md row ~2471 (Otto-161 declined + this
as alternative), docs/research/test-classification.md (PR
#339; category-3 nightly pattern).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#345): 6 review threads — name attribution + cron + YAML + fork-scheduling + BACKLOG ref

- thread Wkcz (line 327): removed broken `memory/feedback_ksk_naming_...`
  reference (factory-personal memories live in `~/.claude/projects/<slug>/memory/`,
  not in-repo); paraphrased the rewrite-authority rule in §10 without
  promising an in-repo path.

- thread WkdI (line 7): purged name-attribution tokens per Otto-220
  code-comments-not-history + doc-comment-history-audit lint
  (PR #363). All "Aaron" / "Otto-NN" / "Amara" / "Max" references
  rewritten to role references ("human maintainer", "prior-contributor",
  "autonomous loop", "initial-starting-point contributor").

- thread WkdX (line 163): cron changed `0 9 * * *` → `7 9 * * *`
  (09:07 UTC) so it matches the "off the hour" comment; note now
  calls out alignment with the sibling scheduled workflow
  `github-settings-drift.yml` (`17 14 * * 1`).

- thread Wkdk (line 146): YAML sketch rewritten to match the actual
  `.github/workflows/gate.yml` installer pattern — three-way-parity
  `./tools/setup/install.sh` invocation plus the same cache-key
  shape (dotnet / mise / nuget). Added explicit note that Windows
  matrix leg depends on `tools/setup/install.sh` growing Windows
  support first per the existing BACKLOG row.

- thread Wkdz (line 248): corrected the fork-scheduling claim. GitHub
  disables scheduled workflows on forks by default — the repo's
  own `github-settings-drift.yml` runs without fork-scoping and
  proves this. The `if: github.repository ==` guard is kept as
  optional hygiene for the rare opt-in-fork case, not as a cost-
  safety requirement.

- thread WkeB (line 316): replaced the wrong `docs/BACKLOG.md`
  line-number reference (~2471 is actually the mise-activate
  / HLL-flakiness neighborhood) with stable grep anchors
  ("Windows matrix in CI" + "Parity swap: CI's `actions/setup-dotnet`").

Markdownlint passes on the edited file.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants