From bda18aba72fd252a85b84b9df30c58d11b85c81b Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 23 Apr 2026 21:57:28 -0400 Subject: [PATCH 1/4] =?UTF-8?q?artifact-c:=20tools/alignment/audit=5Farchi?= =?UTF-8?q?ve=5Fheaders.sh=20=E2=80=94=20archive-header=20lint=20v0=20(det?= =?UTF-8?q?ect-only)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Amara's 5th-ferry Artifact C landing (PR #235 absorb). Detect-only lint for the four archive-header fields proposed in §33 (PR #235 exemplar; not yet governance-landed): - Scope: - Attribution: - Operational status: - Non-fusion disclaimer: Defaults to checking docs/aurora/*.md; --path DIR overrides. --enforce flips exit 2 on any gap; CI does not currently call it (Aminata Otto-80 pass classified §33 as IMPORTANT-pending- Aaron-signoff + lint-required-to-prevent-3-5-round-decay). First-run baseline: 2/2 existing aurora absorbs missing all four headers (predate the proposal). Detect-only first prevents CI block on baseline; enforcement flips when Aaron signs off on §33 + baseline is green (either backfill the 2 absorbs or explicit grandfather clause in §33). v0 limitations documented in script: - Partial-header adversary (label anywhere in first 20 lines passes; no syntactic check). - Fake-header adversary (values not content-audited). - In-memory-import adversary (memory/ not covered; different surface). Harden in follow-up after §33 lands. Bash 3.2 compatible (while-read loop, not mapfile) for macOS default shell. Same --json / --out DIR / exit code shape as existing audit_commit.sh / audit_personas.sh / audit_skills.sh. FACTORY-HYGIENE row #60 added: - Detect-only cadence landed. - Enforcement deferred until Aaron §33 signoff + baseline green. - Same detect-only → triage → enforce pattern as rows #51 (cross-platform parity) and #55 (machine-specific scrubber). tools/alignment/README.md table updated with new row. Composes with: - Aminata threat-model pass (PR #241; names the decay risk this lint prevents). - Amara's 5th-ferry absorb (PR #235; exemplar self-applies the format). - Memory-index hygiene trio (rows #58 / #59 + this row's archive-header hygiene trio). Otto-81 tick deliverable. --- docs/FACTORY-HYGIENE.md | 1 + tools/alignment/README.md | 1 + tools/alignment/audit_archive_headers.sh | 201 +++++++++++++++++++++++ 3 files changed, 203 insertions(+) create mode 100755 tools/alignment/audit_archive_headers.sh diff --git a/docs/FACTORY-HYGIENE.md b/docs/FACTORY-HYGIENE.md index 25b41c24..ec0e96a4 100644 --- a/docs/FACTORY-HYGIENE.md +++ b/docs/FACTORY-HYGIENE.md @@ -97,6 +97,7 @@ is never destructive; retiring one requires an ADR in | 54 | Backlog-refactor cadenced audit (overlap / staleness / priority-drift / knowledge-update sweep of `docs/BACKLOG.md`) | Cadenced detection every 5-10 rounds (same cadence as rows #5 / #23 / #38 / #46 meta-audits) + opportunistic on-touch when a tick adds a new BACKLOG row and the author notices adjacent rows that may overlap. Not exhaustive; bounded passes per firing are acceptable. | Architect (Kenji) on round-cadence sweeps; `backlog-scrum-master` skill if explicitly invoked; all agents (self-administered) on on-touch overlap-spot during authoring. | factory | Read `docs/BACKLOG.md` (or a scoped slice — P0/P1 first if full scan is too large) and apply the following passes: (a) **overlap cluster** — two or more rows describing the same concern from different angles get flagged; decide merge (single consolidated row) or sharpen (two rows with clear non-overlap scope boundaries); (b) **stale retire** — rows where context has died, implementation landed without retire-action, or assumption has been falsified by newer knowledge get explicitly retired with a "retired: " marker (not silent deletion — signal-preservation still applies); (c) **re-prioritize** — priority labels (P0/P1/P2/P3) re-examined against current knowledge; any row whose priority feels wrong after re-read gets a justified move with a one-line rationale; (d) **knowledge absorb** — rows written before a newer architectural insight landed get rewording / cross-refs to the new substrate (e.g., rows predating AutoDream cadence now cite the policy; rows predating scheduling-authority sharpening now note self-schedulability); (e) **document** — ROUND-HISTORY row per fire with pre-audit and post-audit row counts + what was merged / retired / re-prioritized / updated. **Why this row exists:** the human maintainer 2026-04-23 *"we probalby need some meta iteam to refactor the backlog base on current knowledge and look for overlap, this is hygene we could run from time to time so our backlog is not just a dump"*. The BACKLOG is the triage substrate for every future tick's "what to pick up" decision; without periodic meta-audit it becomes an append-only log rather than a living triage surface. **Classification (row #50):** **detection-only-justified** — accumulated drift (overlap, staleness, priority-drift, knowledge-update-gap) is inherently post-hoc; no author-time check can prevent rows from becoming overlapping with *future* rows not yet written. **Maintainer-scope boundary:** rows with explicit maintainer framing at their priority (e.g., P0 rows the human maintainer explicitly set) stay at that priority; re-prioritization applies within the agent-owned priority space only. Ships to project-under-construction: adopters inherit the cadenced-sweep discipline + the retire-with-marker convention + the ROUND-HISTORY documentation pattern. | ROUND-HISTORY row per fire with pre/post row counts + merged/retired/re-prioritized/updated actions; `docs/hygiene-history/backlog-refactor-history.md` (per-fire schema per row #44 — date, agent, rows touched, actions taken, pre/post counts, next-fire-expected-date). | `docs/BACKLOG.md` (target surface) + governing rule in per-user memory (not in-repo; lives at `~/.claude/projects//memory/feedback_backlog_hygiene_cadenced_refactor_look_for_overlap_not_just_dump_2026_04_23.md`) + `.claude/skills/backlog-scrum-master/SKILL.md` (dedicated runner when invoked) + `.claude/skills/reducer/SKILL.md` (Rodney's Razor applied at backlog level) + sibling meta-audit rows #5, #23, #38, #46, #50 | | 52 | Tick-history bounded-growth audit (`docs/hygiene-history/loop-tick-history.md` line-count vs threshold) | Detect-only (landed 2026-04-22); cadenced detection once per round-close (same cadence as row #44 cadence-history sweep, since this is the canonical row #44 worked example auditing itself); opportunistic on-touch whenever the tick-history file is read or edited. Archive action itself remains manual for now; deferring automation to the larger BACKLOG row that also covers threshold-revision and append-without-reading refactor. | Dejan (devops-engineer) on cadenced detection; the tick itself (self-administered at tick-close) on the opportunistic on-touch — each tick's end-of-tick sequence can invoke this audit after the append + commit to get a `within bounds: 96/500 lines` visibility signal. | factory | `tools/hygiene/audit-tick-history-bounded-growth.sh` checks the file's line count against a threshold (default 500, overrideable via `--threshold N`) and exits 0 within bounds / 2 over threshold. The threshold is set lower than the stated 5000-line paper bound because the file is read on every tick-close append — a per-tick context cost that scales linearly with file size — and 5000 lines represents too large a context hit on a 1-minute cadence. The audit's header block carries a mini-ADR decision record for the 500-line choice (context / decision / alternatives / supersedes / expires-when). **Why this row exists:** Aaron 2026-04-22 tick-fire interrupt: *"does loop tick history grow unbounded? that's an issue if so you just read it"*. Honest state was stated-bound-no-enforcement: file header named 5000 lines, nothing checked it. This row closes the enforcement gap for the threshold-check half of the full BACKLOG row (archive-action + append-without-reading refactor remain deferred). **Self-referential closure:** the tick-history file IS the canonical row-#44 cadence-history-tracking worked example (named explicitly in row #44's "Durable output" citation). Until this row landed, the most-cadenced surface in the factory — the tick itself — had its fire-log surface unaudited for its own growth. Meta-audit triangle remains intact (existence #23 / activation #43 / fire-history #44), and row #49 adds a fourth: fire-history files themselves need bounded-growth audits because they grow at the cadence of the surface they track. **Classification (row #47):** **prevention-bearing** — the audit surfaces approaching-threshold warnings at 80% so the archive action can be planned, rather than reactive-only at over-threshold. Ships to project-under-construction indirectly: adopters inherit the pattern (fire-log files under their own `docs/hygiene-history/` need the same bounded-growth treatment), not this exact script. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/tick-history-bounded-growth-history.md` (per-fire schema per row #44); BACKLOG row when archival is due (archive-action itself queued as part of the larger tick-history enforcement BACKLOG row); ROUND-HISTORY row when threshold changes or archive action executes. | `tools/hygiene/audit-tick-history-bounded-growth.sh` (detection + mini-ADR header block) + `docs/hygiene-history/loop-tick-history.md` (target surface, canonical row #44 worked example) + BACKLOG row *"Loop-tick-history bounded-growth enforcement"* (larger follow-up: threshold revision + append-without-reading refactor + archive action) | | 59 | Memory-reference-existence CI check (every `](foo.md)` link target in `memory/MEMORY.md` MUST resolve to an actual file under `memory/`) | Every pull_request + push-to-main touching `memory/**` or the audit tool / workflow; workflow-dispatch manual run available | Automated (`.github/workflows/memory-reference-existence-lint.yml`); any contributor resolves on fail | factory | `tools/hygiene/audit-memory-references.sh --enforce` parses link targets of the form `](.md)` in the supplied file (default `memory/MEMORY.md`), resolves each against a base dir (default `memory/`), and fails (exit 2 under `--enforce`) on any broken reference. Supports `--file PATH` and `--base DIR` for custom use. **Why this row exists:** Amara 2026-04-23 4th-ferry absorb (PR #221 Determinize-stage action) — her commit samples show repeated cleanup passes for memory paths that didn't exist; this is the retrieval-drift class she named. First-run baseline (2026-04-24): in-repo `memory/MEMORY.md` 44 refs all resolve; per-user MEMORY.md 391 refs all resolve (PR #220 memory-index-integrity CI has kept the substrate clean). **Third leg of memory-index hygiene:** row #58 (same-commit-pairing) + AceHack PR #12 (no duplicates) + this row (refs resolve) = three complementary checks. **Classification (row #47):** **prevention-bearing** — blocks merge before broken refs land. Ships to project-under-construction: adopters inherit the tool + workflow + three-leg hygiene pattern. | CI job result; first-run baseline captured in PR body. Optional fire-history file if longer-than-90-day retention wanted. | `.github/workflows/memory-reference-existence-lint.yml` + `tools/hygiene/audit-memory-references.sh` + sibling rows #58 (PR #220) + AceHack PR #12 duplicate-lint + `docs/aurora/2026-04-23-amara-memory-drift-alignment-claude-to-memories-drift.md` | +| 60 | Archive-header discipline audit (every `docs/aurora/**/*.md` absorb doc MUST have `Scope:` / `Attribution:` / `Operational status:` / `Non-fusion disclaimer:` in its first 20 lines — proposed §33) | Detect-only (landed 2026-04-23 Otto-81); cadenced detection every 5-10 rounds + opportunistic on-touch when a tick lands a new aurora absorb. Enforcement (`--enforce` exit-2 in CI) **deferred** until Aaron signs off on the proposed GOVERNANCE §33 + baseline is green (existing two aurora absorbs predate the proposal and need backfill or explicit grandfather). | Aminata (threat-model-critic) on the governance-edit-review cadence (her Otto-80 pass is the first); the absorbing agent (self-administered) on on-touch — every new aurora absorb runs the audit before committing. | factory | `tools/alignment/audit_archive_headers.sh` scans `docs/aurora/*.md` (default path; `--path DIR` for other archive roots) for the four header labels and reports per-file missing-label lists. `--enforce` flips exit 2 on any gap. First-run baseline (2026-04-23, Otto-81): 2/2 existing aurora absorbs missing all four headers (they predate the proposal). **Why this row exists:** Amara's 5th-ferry Artifact C proposal + Aminata's Otto-80 finding that proposed §33 would decay within 3-5 rounds without a companion lint (`docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md`). **Why detect-only first:** baseline has 2 gaps from the 2 existing absorbs; enforcement before either backfill or explicit grandfather would block main. Same pattern as rows #51 (cross-platform parity) and #55 (machine-specific scrubber): detect-only → triage → enforce. **v0 limitations** (documented in script): partial-header adversary (header label anywhere in first 20 lines passes — no syntactic structure check), fake-header adversary (values not content-audited), in-memory-import adversary (memory/ absorbs not covered — by design, different surface). Harden in a follow-up after §33 lands. **Classification (row #47):** **prevention-bearing at author-time** (the absorbing agent runs the audit before committing the new aurora doc) + **detection-only in CI** (until enforcement flips). Ships to project-under-construction: adopters inherit the tool + header format + detect-to-enforce transition pattern. | Audit output on each fire; first-run baseline captured in PR body. Optional fire-history file if longer-than-90-day retention wanted. BACKLOG row when §33 lands + baseline is green to flip to enforcement. | `tools/alignment/audit_archive_headers.sh` + `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` (PR #241; Aminata analysis of proposed §33 decay-without-lint risk) + `docs/aurora/2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md` (PR #235; Amara's 5th-ferry Artifact C proposal + the exemplar absorb that self-applies the format) + sibling meta-audit rows #58 / #59 (memory-index hygiene trio) | | 58 | Memory-index-integrity CI check (PR/push that adds or modifies `memory/*.md` MUST also update `memory/MEMORY.md` in the same range) | Every pull_request + push-to-main touching `memory/**`; workflow-dispatch manual run available | Automated (`.github/workflows/memory-index-integrity.yml`); human-maintainer or any contributor resolves on fail | factory | Scope triggers: top-level `memory/*.md` add-or-modify (excluding `memory/README.md` and `memory/MEMORY.md` itself, and excluding `memory/persona/**` which has its own lifecycle). Check: if any trigger-qualifying file changed in the PR/push range, `memory/MEMORY.md` MUST also be in that range. Fail message cites NSA-001 (canonical incident: new memory landed without MEMORY.md pointer → undiscoverable from fresh session). Safe-pattern compliant per row #43 (SHA-pinned actions, explicit minimum permissions, no user-authored context interpolation, concurrency group, pinned runs-on). **Why this row exists:** Amara 2026-04-23 decision-proxy + technical review courier report (absorbed as PR #219) — action item #1 in her "10 immediate fixes" list, highest-value by her own ranking. Directly addresses the NSA-001 measured failure mode. **Classification (row #47):** **prevention-bearing** — the check runs at PR author-time, blocks merge before the memory substrate can diverge from its index. Ships to project-under-construction: adopters inherit the workflow unchanged; the `memory/**.md` and `memory/MEMORY.md` conventions are factory-generic. | CI job result + annotated fail message in PR checks + `docs/hygiene-history/memory-index-integrity-fires.md` (per-fire schema per row #44 — optional; CI log is durable for 90 days so fire-history file exists only if the human maintainer wants longer retention) | `.github/workflows/memory-index-integrity.yml` (detection + fail message) + `docs/hygiene-history/nsa-test-history.md` (NSA-001 canonical incident) + `docs/aurora/2026-04-23-amara-decision-proxy-technical-review.md` (ferry with proposal) + FACTORY-HYGIENE row #25 (pointer-integrity audit — covers dangling-pointer from the other direction) | | 55 | Machine-specific content scrubber (cadenced audit of in-repo tracked files for user-home paths, Claude Code harness paths, Windows user-profile paths, hostname leaks) | Detect-only (landed 2026-04-23); cadenced detection once per round-close (same cadence as rows #50 / #51 / #52 meta-audits) + opportunistic on-touch when a tick migrates per-user content to in-repo. Enforcement (`--enforce` exit-2) deferred until baseline is green. | Dejan (devops-engineer) on cadenced detection + CI-enforcement sign-off when baseline is green; the migrating agent (self-administered) on on-touch — every in-repo-first migration runs the audit before committing. | factory | `tools/hygiene/audit-machine-specific-content.sh` scans all tracked files (`git ls-files`) for machine-specific patterns: `/Users//`, `/home//`, `C:\Users\`, `C:/Users/`. Excludes: `docs/ROUND-HISTORY.md`, `docs/hygiene-history/**`, `docs/DECISIONS/**`, and the audit script itself. `--list` prints offending files; `--enforce` flips exit 2 on any gap. **Why this row exists:** Aaron 2026-04-23 Otto-27 — *"we can have a machine specific scrubber/lint hygene task for anyting that makes it in by default. just run on a cadence."* Following the Option D in-repo-first policy shift (per-user memory migrations to in-repo became the default), machine-specific content leakage becomes a real risk — content comfortably per-user now crosses the factory's public repo boundary. Baseline at first fire (2026-04-23) was 9 gaps: `/Users/` patterns in several SKILL.md files, 2 PDFs (metadata scan), a scratch-recon doc, a parallel-worktree research doc; `C:\Users\` pattern in 1 SKILL.md + `memory/feedback_path_hygiene.md` (anti-example reference — legitimate). **Classification (row #47):** **prevention-bearing** — the audit runs at author-time (on-touch during in-repo-first migrations) and surfaces the gap before it lands. Ships to project-under-construction: adopters inherit the audit + pattern list + exclusion-list discipline. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/machine-specific-content-audit-history.md` (per-fire schema per row #44 — date, agent, gaps count, files touched, actions taken, next-fire-expected-date); BACKLOG row per gap at triage time if cleanup doesn't fit on-touch. | `tools/hygiene/audit-machine-specific-content.sh` (detection + pattern list + exclusion list) + cross-refs: `memory/feedback_path_hygiene.md` + `memory/CURRENT-aaron.md` + `memory/CURRENT-amara.md` (in-repo-first migration boundary surfaces this audit's need) | | 56 | MD032 plus-at-line-start preflight audit (detects prose-continuation `+` followed by space that markdownlint misparses as list items) | Detect-only (landed 2026-04-24); on-touch when author edits markdown; round-cadence sweep + `--enforce` flip when baseline is green. | Dejan (devops-engineer) on cadenced + enforce-transition; author of markdown change self-administered on-touch. | factory | `tools/hygiene/audit-md032-plus-linestart.sh` scans tracked `.md` files for CommonMark-style plus-then-space list-marker lines (regex `^ {0,3}\+` followed by a single space: up to 3 leading spaces allowed, then `+`, then space) where the previous line is non-blank AND is not itself a plus-then-space marker line (so contiguous plus-space lists are not flagged). Whitespace-normalisation on the predecessor-blank check strips all whitespace classes (spaces, tabs, CR) via `[[:space:]]`, so tab-only separator lines count as blank. Path iteration uses NUL-delimited `git ls-files -z` piped into a `while read -d ''` loop and the script runs `cd` to `git rev-parse --show-toplevel` first, so paths resolve from repo root regardless of working directory. Excludes `docs/ROUND-HISTORY.md`, `docs/hygiene-history/**`, `docs/DECISIONS/**`, and self. The `--list` flag prints offending `file:lineno`; `--enforce` flips exit 2 on gap. **Why this row exists:** Otto-session 2026-04-23 hit MD032 regressions three times (Otto-35 + Otto-38 + Otto-38-again). The pattern is author-friendly in intent (prose continuation using `+`) but markdownlint-hostile (parsed as list item). Author-time detection prevents the full CI round-trip. Baseline at first fire (2026-04-24, post review-drain revision on PR #204) was ~170 gaps at repo scope — the CommonMark-aware rewrite removed the earlier file-level-skip heuristic (which masked false negatives when a file used `+` as its bullet style but still contained a prose-continuation `+`) in favour of per-line contiguous-list detection. **Classification (row #47):** **prevention-bearing** — audit runs at author-time (on-touch) and surfaces gap before commit. Ships to project-under-construction: adopters inherit audit + pattern + exclusion discipline. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/md032-plus-linestart-audit-history.md` (per-fire schema per row #44); author-time gap lands as fix-at-source (opportunistic). | `tools/hygiene/audit-md032-plus-linestart.sh` + this row's self-reference | diff --git a/tools/alignment/README.md b/tools/alignment/README.md index d9c09bf8..a443e88b 100644 --- a/tools/alignment/README.md +++ b/tools/alignment/README.md @@ -16,6 +16,7 @@ folder as the experimental loop. | `audit_commit.sh` | HC-2, HC-6, SD-6 alignment clauses | Per-commit lint | | `audit_personas.sh` | Notebook touch + commit mentions | Per-round persona runtime | | `audit_skills.sh` | DORA-2025 columns adapted to skill scope | Per-round skill runtime | +| `audit_archive_headers.sh` | Archive-header discipline (proposed §33) | Per-file lint (detect-only v0) | | `sd6_names.txt` | SD-6 watchlist (per-host) | Data (not code) | The three scripts form the gitops observability trio: diff --git a/tools/alignment/audit_archive_headers.sh b/tools/alignment/audit_archive_headers.sh new file mode 100755 index 00000000..6bfcde73 --- /dev/null +++ b/tools/alignment/audit_archive_headers.sh @@ -0,0 +1,201 @@ +#!/usr/bin/env bash +# +# tools/alignment/audit_archive_headers.sh — archive-header +# discipline lint (Amara 5th-ferry Artifact C, detect-only v0). +# +# Checks every `docs/aurora/**/*.md` absorb doc for the four +# archive-header fields proposed in Amara's 5th ferry +# (§33 candidate, PR #235 absorb): +# +# Scope: research / cross-review / archival purpose +# Attribution: speaker labels preserved +# Operational status: research-grade | operational +# Non-fusion disclaimer: explicit non-fusion clause +# +# The tool is deliberately *detect-only* at v0. Running +# `--enforce` makes it exit non-zero on any missing header, +# but CI does not currently call that flag — Aminata's Otto-80 +# threat-model pass flagged proposed §33 as IMPORTANT-not- +# CRITICAL pending Aaron signoff on the governance edit +# itself. This tool is the mechanism that will back §33 if / +# when it lands; it also provides detect-only signal today +# so drift is visible before enforcement. +# +# Usage: +# tools/alignment/audit_archive_headers.sh # detect-only +# tools/alignment/audit_archive_headers.sh --enforce # exit 2 on gap +# tools/alignment/audit_archive_headers.sh --path DIR # custom path +# tools/alignment/audit_archive_headers.sh --json # JSON output +# tools/alignment/audit_archive_headers.sh --out DIR # per-file JSON +# +# Exit codes: +# 0 All archive docs have all four headers (or --enforce unset). +# 2 One or more archive docs missing header(s) and --enforce set. +# 64 Script error / missing dependency / bad args. +# +# Scope: +# - Default path: `docs/aurora/` — every `.md` file is treated +# as archive-of-external-conversation and checked. +# - `--path DIR` overrides to check a different archive root +# (e.g. `docs/research/` would apply only if research docs +# were the scope; v0 leaves this for explicit opt-in). +# +# Not in scope (v0): +# - Content-level validation of header values. A doc with +# `Scope: research` as prose in paragraph 3 technically +# passes; this is the partial-header-adversary Aminata +# flagged. Harden via syntactic requirement (header must +# appear in the first N lines + as a definition-list item +# or bold label) in a follow-up. +# - Cross-repo checks (KSK / lucent-ksk cross-references). +# - Memory-file archive-header checks. Memory lives under +# `~/.claude/projects//memory/` (per-user, not +# in-repo). In-repo `memory/` is a different surface; this +# tool does not assume it covers archive content. +# +# Reference: `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` +# (PR #241) — Aminata's analysis of why this lint matters for +# §33 not to decay within 3-5 rounds. + +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +cd "$REPO_ROOT" + +target_path="docs/aurora" +enforce=false +json=false +out_dir="" + +while [[ $# -gt 0 ]]; do + case "$1" in + --enforce) enforce=true; shift ;; + --json) json=true; shift ;; + --path) + if [[ -z "${2:-}" ]]; then + echo "audit_archive_headers: --path requires a directory" >&2 + exit 64 + fi + target_path="$2"; shift 2 ;; + --out) + if [[ -z "${2:-}" ]]; then + echo "audit_archive_headers: --out requires a directory" >&2 + exit 64 + fi + out_dir="$2"; shift 2 ;; + -h|--help) + sed -n '3,55p' "$0" | sed 's/^# //;s/^#//' + exit 0 ;; + *) + echo "audit_archive_headers: unknown arg: $1" >&2 + exit 64 ;; + esac +done + +if [[ ! -d "$target_path" ]]; then + echo "audit_archive_headers: target path not found: $target_path" >&2 + exit 64 +fi + +# The four required headers. Each pattern matches the label in +# the first 20 lines of the file. v0 uses substring match on +# the label; content-validation is out-of-scope. +declare -a HEADER_LABELS=( + 'Scope:' + 'Attribution:' + 'Operational status:' + 'Non-fusion disclaimer:' +) + +# Collect archive files (ASCII sort for stable output). +# Use a while-read loop instead of mapfile so the tool runs on +# bash 3.2 (macOS default) as well as bash 4+. +archive_files=() +while IFS= read -r f; do + archive_files+=("$f") +done < <(find "$target_path" -maxdepth 1 -type f -name '*.md' | sort) + +if [[ ${#archive_files[@]} -eq 0 ]]; then + echo "audit_archive_headers: no .md files in $target_path" >&2 + exit 0 +fi + +total_files=${#archive_files[@]} +files_with_all_headers=0 +files_missing_headers=0 +gap_details="" + +if [[ -n "$out_dir" ]]; then + mkdir -p "$out_dir" +fi + +for file in "${archive_files[@]}"; do + # Read first 20 lines to scope the header check. + head_content=$(head -n 20 "$file") + missing_labels=() + + for label in "${HEADER_LABELS[@]}"; do + if ! grep -qF -- "$label" <<< "$head_content"; then + missing_labels+=("$label") + fi + done + + if [[ ${#missing_labels[@]} -eq 0 ]]; then + files_with_all_headers=$((files_with_all_headers + 1)) + file_status="ok" + else + files_missing_headers=$((files_missing_headers + 1)) + file_status="missing" + missing_joined=$(IFS=','; echo "${missing_labels[*]}") + gap_details+=" $file: missing [$missing_joined]"$'\n' + fi + + if [[ -n "$out_dir" ]]; then + # Per-file JSON (same shape as audit_commit.sh / audit_personas.sh). + file_base=$(basename "$file" .md) + out_file="$out_dir/${file_base}.json" + { + echo "{" + echo " \"path\": \"$file\"," + echo " \"status\": \"$file_status\"," + printf ' "missing_labels": [' + for i in "${!missing_labels[@]}"; do + if [[ $i -gt 0 ]]; then printf ', '; fi + printf '"%s"' "${missing_labels[$i]}" + done + echo "]," + echo " \"tool\": \"audit_archive_headers\"," + echo " \"v\": 0" + echo "}" + } > "$out_file" + fi +done + +if $json; then + echo "{" + echo " \"tool\": \"audit_archive_headers\"," + echo " \"v\": 0," + echo " \"target_path\": \"$target_path\"," + echo " \"total_files\": $total_files," + echo " \"files_ok\": $files_with_all_headers," + echo " \"files_missing_headers\": $files_missing_headers," + printf ' "enforce": %s\n' "$enforce" + echo "}" +else + echo "archive-header audit on $target_path" >&2 + echo " files checked: $total_files" >&2 + echo " all four headers ok: $files_with_all_headers" >&2 + echo " missing one or more: $files_missing_headers" >&2 + if [[ -n "$gap_details" ]]; then + echo "" >&2 + echo "gaps:" >&2 + printf '%s' "$gap_details" >&2 + fi +fi + +# Exit code discipline +if [[ $files_missing_headers -gt 0 ]] && $enforce; then + exit 2 +fi + +exit 0 From 723e9a408b3700b38e59c9286edca1ed2921c18b Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 20:37:35 -0400 Subject: [PATCH 2/4] =?UTF-8?q?drain(#243):=20seven=20Copilot/Codex=20thre?= =?UTF-8?q?ads=20=E2=80=94=20recursive=20scan=20+=20name-attribution=20+?= =?UTF-8?q?=20exit-code=20alignment?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Switch audit_archive_headers.sh from -maxdepth 1 to recursive find matching documented `docs/aurora/**/*.md` scope; exclude `references/` as bibliographic substrate. - Encode subdirectory in --out per-file JSON basename to avoid collisions under recursive scan. - Replace 'Aaron' with 'human-maintainer' role ref in script and FACTORY-HYGIENE row 60 (FACTORY-DISCIPLINE name-attribution rule). - Drop persona names (Aminata, Amara) from script comments and row 60 in favour of role references (threat-model reviewer, absorbing agent), per Otto-220 code-comments-explain-code rule. - Realign exit codes to sibling audit_*.sh convention: 1 = content-level signal under --enforce; 2 = script error / missing dependency / bad arg. Update header doc-block + row 60 wording to match. - Remove dead cross-reference to non-existent `docs/aurora/2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md` in row 60. Verified the aminata-threat-model-5th-ferry citation does exist on origin/main; kept that one. - Append docs/pr-preservation/243-drain-log.md per Otto-250. Smoke-tested: clean run exit 0 (16 files scanned), --enforce exit 1, bad --path exit 2, --json exit 0, --out has no basename collisions. --- docs/FACTORY-HYGIENE.md | 2 +- docs/pr-preservation/243-drain-log.md | 182 +++++++++++++++++++++++ tools/alignment/audit_archive_headers.sh | 68 +++++---- 3 files changed, 225 insertions(+), 27 deletions(-) create mode 100644 docs/pr-preservation/243-drain-log.md diff --git a/docs/FACTORY-HYGIENE.md b/docs/FACTORY-HYGIENE.md index ec0e96a4..0314486f 100644 --- a/docs/FACTORY-HYGIENE.md +++ b/docs/FACTORY-HYGIENE.md @@ -97,7 +97,7 @@ is never destructive; retiring one requires an ADR in | 54 | Backlog-refactor cadenced audit (overlap / staleness / priority-drift / knowledge-update sweep of `docs/BACKLOG.md`) | Cadenced detection every 5-10 rounds (same cadence as rows #5 / #23 / #38 / #46 meta-audits) + opportunistic on-touch when a tick adds a new BACKLOG row and the author notices adjacent rows that may overlap. Not exhaustive; bounded passes per firing are acceptable. | Architect (Kenji) on round-cadence sweeps; `backlog-scrum-master` skill if explicitly invoked; all agents (self-administered) on on-touch overlap-spot during authoring. | factory | Read `docs/BACKLOG.md` (or a scoped slice — P0/P1 first if full scan is too large) and apply the following passes: (a) **overlap cluster** — two or more rows describing the same concern from different angles get flagged; decide merge (single consolidated row) or sharpen (two rows with clear non-overlap scope boundaries); (b) **stale retire** — rows where context has died, implementation landed without retire-action, or assumption has been falsified by newer knowledge get explicitly retired with a "retired: " marker (not silent deletion — signal-preservation still applies); (c) **re-prioritize** — priority labels (P0/P1/P2/P3) re-examined against current knowledge; any row whose priority feels wrong after re-read gets a justified move with a one-line rationale; (d) **knowledge absorb** — rows written before a newer architectural insight landed get rewording / cross-refs to the new substrate (e.g., rows predating AutoDream cadence now cite the policy; rows predating scheduling-authority sharpening now note self-schedulability); (e) **document** — ROUND-HISTORY row per fire with pre-audit and post-audit row counts + what was merged / retired / re-prioritized / updated. **Why this row exists:** the human maintainer 2026-04-23 *"we probalby need some meta iteam to refactor the backlog base on current knowledge and look for overlap, this is hygene we could run from time to time so our backlog is not just a dump"*. The BACKLOG is the triage substrate for every future tick's "what to pick up" decision; without periodic meta-audit it becomes an append-only log rather than a living triage surface. **Classification (row #50):** **detection-only-justified** — accumulated drift (overlap, staleness, priority-drift, knowledge-update-gap) is inherently post-hoc; no author-time check can prevent rows from becoming overlapping with *future* rows not yet written. **Maintainer-scope boundary:** rows with explicit maintainer framing at their priority (e.g., P0 rows the human maintainer explicitly set) stay at that priority; re-prioritization applies within the agent-owned priority space only. Ships to project-under-construction: adopters inherit the cadenced-sweep discipline + the retire-with-marker convention + the ROUND-HISTORY documentation pattern. | ROUND-HISTORY row per fire with pre/post row counts + merged/retired/re-prioritized/updated actions; `docs/hygiene-history/backlog-refactor-history.md` (per-fire schema per row #44 — date, agent, rows touched, actions taken, pre/post counts, next-fire-expected-date). | `docs/BACKLOG.md` (target surface) + governing rule in per-user memory (not in-repo; lives at `~/.claude/projects//memory/feedback_backlog_hygiene_cadenced_refactor_look_for_overlap_not_just_dump_2026_04_23.md`) + `.claude/skills/backlog-scrum-master/SKILL.md` (dedicated runner when invoked) + `.claude/skills/reducer/SKILL.md` (Rodney's Razor applied at backlog level) + sibling meta-audit rows #5, #23, #38, #46, #50 | | 52 | Tick-history bounded-growth audit (`docs/hygiene-history/loop-tick-history.md` line-count vs threshold) | Detect-only (landed 2026-04-22); cadenced detection once per round-close (same cadence as row #44 cadence-history sweep, since this is the canonical row #44 worked example auditing itself); opportunistic on-touch whenever the tick-history file is read or edited. Archive action itself remains manual for now; deferring automation to the larger BACKLOG row that also covers threshold-revision and append-without-reading refactor. | Dejan (devops-engineer) on cadenced detection; the tick itself (self-administered at tick-close) on the opportunistic on-touch — each tick's end-of-tick sequence can invoke this audit after the append + commit to get a `within bounds: 96/500 lines` visibility signal. | factory | `tools/hygiene/audit-tick-history-bounded-growth.sh` checks the file's line count against a threshold (default 500, overrideable via `--threshold N`) and exits 0 within bounds / 2 over threshold. The threshold is set lower than the stated 5000-line paper bound because the file is read on every tick-close append — a per-tick context cost that scales linearly with file size — and 5000 lines represents too large a context hit on a 1-minute cadence. The audit's header block carries a mini-ADR decision record for the 500-line choice (context / decision / alternatives / supersedes / expires-when). **Why this row exists:** Aaron 2026-04-22 tick-fire interrupt: *"does loop tick history grow unbounded? that's an issue if so you just read it"*. Honest state was stated-bound-no-enforcement: file header named 5000 lines, nothing checked it. This row closes the enforcement gap for the threshold-check half of the full BACKLOG row (archive-action + append-without-reading refactor remain deferred). **Self-referential closure:** the tick-history file IS the canonical row-#44 cadence-history-tracking worked example (named explicitly in row #44's "Durable output" citation). Until this row landed, the most-cadenced surface in the factory — the tick itself — had its fire-log surface unaudited for its own growth. Meta-audit triangle remains intact (existence #23 / activation #43 / fire-history #44), and row #49 adds a fourth: fire-history files themselves need bounded-growth audits because they grow at the cadence of the surface they track. **Classification (row #47):** **prevention-bearing** — the audit surfaces approaching-threshold warnings at 80% so the archive action can be planned, rather than reactive-only at over-threshold. Ships to project-under-construction indirectly: adopters inherit the pattern (fire-log files under their own `docs/hygiene-history/` need the same bounded-growth treatment), not this exact script. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/tick-history-bounded-growth-history.md` (per-fire schema per row #44); BACKLOG row when archival is due (archive-action itself queued as part of the larger tick-history enforcement BACKLOG row); ROUND-HISTORY row when threshold changes or archive action executes. | `tools/hygiene/audit-tick-history-bounded-growth.sh` (detection + mini-ADR header block) + `docs/hygiene-history/loop-tick-history.md` (target surface, canonical row #44 worked example) + BACKLOG row *"Loop-tick-history bounded-growth enforcement"* (larger follow-up: threshold revision + append-without-reading refactor + archive action) | | 59 | Memory-reference-existence CI check (every `](foo.md)` link target in `memory/MEMORY.md` MUST resolve to an actual file under `memory/`) | Every pull_request + push-to-main touching `memory/**` or the audit tool / workflow; workflow-dispatch manual run available | Automated (`.github/workflows/memory-reference-existence-lint.yml`); any contributor resolves on fail | factory | `tools/hygiene/audit-memory-references.sh --enforce` parses link targets of the form `](.md)` in the supplied file (default `memory/MEMORY.md`), resolves each against a base dir (default `memory/`), and fails (exit 2 under `--enforce`) on any broken reference. Supports `--file PATH` and `--base DIR` for custom use. **Why this row exists:** Amara 2026-04-23 4th-ferry absorb (PR #221 Determinize-stage action) — her commit samples show repeated cleanup passes for memory paths that didn't exist; this is the retrieval-drift class she named. First-run baseline (2026-04-24): in-repo `memory/MEMORY.md` 44 refs all resolve; per-user MEMORY.md 391 refs all resolve (PR #220 memory-index-integrity CI has kept the substrate clean). **Third leg of memory-index hygiene:** row #58 (same-commit-pairing) + AceHack PR #12 (no duplicates) + this row (refs resolve) = three complementary checks. **Classification (row #47):** **prevention-bearing** — blocks merge before broken refs land. Ships to project-under-construction: adopters inherit the tool + workflow + three-leg hygiene pattern. | CI job result; first-run baseline captured in PR body. Optional fire-history file if longer-than-90-day retention wanted. | `.github/workflows/memory-reference-existence-lint.yml` + `tools/hygiene/audit-memory-references.sh` + sibling rows #58 (PR #220) + AceHack PR #12 duplicate-lint + `docs/aurora/2026-04-23-amara-memory-drift-alignment-claude-to-memories-drift.md` | -| 60 | Archive-header discipline audit (every `docs/aurora/**/*.md` absorb doc MUST have `Scope:` / `Attribution:` / `Operational status:` / `Non-fusion disclaimer:` in its first 20 lines — proposed §33) | Detect-only (landed 2026-04-23 Otto-81); cadenced detection every 5-10 rounds + opportunistic on-touch when a tick lands a new aurora absorb. Enforcement (`--enforce` exit-2 in CI) **deferred** until Aaron signs off on the proposed GOVERNANCE §33 + baseline is green (existing two aurora absorbs predate the proposal and need backfill or explicit grandfather). | Aminata (threat-model-critic) on the governance-edit-review cadence (her Otto-80 pass is the first); the absorbing agent (self-administered) on on-touch — every new aurora absorb runs the audit before committing. | factory | `tools/alignment/audit_archive_headers.sh` scans `docs/aurora/*.md` (default path; `--path DIR` for other archive roots) for the four header labels and reports per-file missing-label lists. `--enforce` flips exit 2 on any gap. First-run baseline (2026-04-23, Otto-81): 2/2 existing aurora absorbs missing all four headers (they predate the proposal). **Why this row exists:** Amara's 5th-ferry Artifact C proposal + Aminata's Otto-80 finding that proposed §33 would decay within 3-5 rounds without a companion lint (`docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md`). **Why detect-only first:** baseline has 2 gaps from the 2 existing absorbs; enforcement before either backfill or explicit grandfather would block main. Same pattern as rows #51 (cross-platform parity) and #55 (machine-specific scrubber): detect-only → triage → enforce. **v0 limitations** (documented in script): partial-header adversary (header label anywhere in first 20 lines passes — no syntactic structure check), fake-header adversary (values not content-audited), in-memory-import adversary (memory/ absorbs not covered — by design, different surface). Harden in a follow-up after §33 lands. **Classification (row #47):** **prevention-bearing at author-time** (the absorbing agent runs the audit before committing the new aurora doc) + **detection-only in CI** (until enforcement flips). Ships to project-under-construction: adopters inherit the tool + header format + detect-to-enforce transition pattern. | Audit output on each fire; first-run baseline captured in PR body. Optional fire-history file if longer-than-90-day retention wanted. BACKLOG row when §33 lands + baseline is green to flip to enforcement. | `tools/alignment/audit_archive_headers.sh` + `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` (PR #241; Aminata analysis of proposed §33 decay-without-lint risk) + `docs/aurora/2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md` (PR #235; Amara's 5th-ferry Artifact C proposal + the exemplar absorb that self-applies the format) + sibling meta-audit rows #58 / #59 (memory-index hygiene trio) | +| 60 | Archive-header discipline audit (every `docs/aurora/**/*.md` absorb doc MUST have `Scope:` / `Attribution:` / `Operational status:` / `Non-fusion disclaimer:` in its first 20 lines — proposed §33) | Detect-only (landed 2026-04-23 Otto-81); cadenced detection every 5-10 rounds + opportunistic on-touch when a tick lands a new aurora absorb. Enforcement (`--enforce` exit-1 in CI) **deferred** until the human maintainer signs off on the proposed GOVERNANCE §33 + baseline is green (existing aurora absorbs that predate the proposal need backfill or explicit grandfather). | Threat-model reviewer on the governance-edit-review cadence; the absorbing agent (self-administered) on on-touch — every new aurora absorb runs the audit before committing. | factory | `tools/alignment/audit_archive_headers.sh` scans `docs/aurora/**/*.md` recursively (default path; `--path DIR` for other archive roots; `references/` excluded as bibliographic substrate) for the four header labels and reports per-file missing-label lists. `--enforce` flips exit 1 on any gap (content-level signal), exit 2 on script error / missing dependency / bad arg — same exit-code shape as sibling `tools/alignment/audit_*.sh` scripts. First-run baseline (2026-04-23, Otto-81): existing aurora absorbs missing all four headers (they predate the proposal). **Why this row exists:** Amara's 5th-ferry Artifact C proposal + threat-model reviewer's finding that proposed §33 would decay within 3-5 rounds without a companion lint (see `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md`). **Why detect-only first:** baseline has gaps from the existing absorbs; enforcement before either backfill or explicit grandfather would block main. Same pattern as rows #51 (cross-platform parity) and #55 (machine-specific scrubber): detect-only → triage → enforce. **v0 limitations** (documented in script): partial-header adversary (header label anywhere in first 20 lines passes — no syntactic structure check), fake-header adversary (values not content-audited), in-memory-import adversary (memory/ absorbs not covered — by design, different surface). Harden in a follow-up after §33 lands. **Classification (row #47):** **prevention-bearing at author-time** (the absorbing agent runs the audit before committing the new aurora doc) + **detection-only in CI** (until enforcement flips). Ships to project-under-construction: adopters inherit the tool + header format + detect-to-enforce transition pattern. | Audit output on each fire; first-run baseline captured in PR body. Optional fire-history file if longer-than-90-day retention wanted. BACKLOG row when §33 lands + baseline is green to flip to enforcement. | `tools/alignment/audit_archive_headers.sh` + `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` (PR #241; threat-model analysis of proposed §33 decay-without-lint risk) + sibling meta-audit rows #58 / #59 (memory-index hygiene trio) | | 58 | Memory-index-integrity CI check (PR/push that adds or modifies `memory/*.md` MUST also update `memory/MEMORY.md` in the same range) | Every pull_request + push-to-main touching `memory/**`; workflow-dispatch manual run available | Automated (`.github/workflows/memory-index-integrity.yml`); human-maintainer or any contributor resolves on fail | factory | Scope triggers: top-level `memory/*.md` add-or-modify (excluding `memory/README.md` and `memory/MEMORY.md` itself, and excluding `memory/persona/**` which has its own lifecycle). Check: if any trigger-qualifying file changed in the PR/push range, `memory/MEMORY.md` MUST also be in that range. Fail message cites NSA-001 (canonical incident: new memory landed without MEMORY.md pointer → undiscoverable from fresh session). Safe-pattern compliant per row #43 (SHA-pinned actions, explicit minimum permissions, no user-authored context interpolation, concurrency group, pinned runs-on). **Why this row exists:** Amara 2026-04-23 decision-proxy + technical review courier report (absorbed as PR #219) — action item #1 in her "10 immediate fixes" list, highest-value by her own ranking. Directly addresses the NSA-001 measured failure mode. **Classification (row #47):** **prevention-bearing** — the check runs at PR author-time, blocks merge before the memory substrate can diverge from its index. Ships to project-under-construction: adopters inherit the workflow unchanged; the `memory/**.md` and `memory/MEMORY.md` conventions are factory-generic. | CI job result + annotated fail message in PR checks + `docs/hygiene-history/memory-index-integrity-fires.md` (per-fire schema per row #44 — optional; CI log is durable for 90 days so fire-history file exists only if the human maintainer wants longer retention) | `.github/workflows/memory-index-integrity.yml` (detection + fail message) + `docs/hygiene-history/nsa-test-history.md` (NSA-001 canonical incident) + `docs/aurora/2026-04-23-amara-decision-proxy-technical-review.md` (ferry with proposal) + FACTORY-HYGIENE row #25 (pointer-integrity audit — covers dangling-pointer from the other direction) | | 55 | Machine-specific content scrubber (cadenced audit of in-repo tracked files for user-home paths, Claude Code harness paths, Windows user-profile paths, hostname leaks) | Detect-only (landed 2026-04-23); cadenced detection once per round-close (same cadence as rows #50 / #51 / #52 meta-audits) + opportunistic on-touch when a tick migrates per-user content to in-repo. Enforcement (`--enforce` exit-2) deferred until baseline is green. | Dejan (devops-engineer) on cadenced detection + CI-enforcement sign-off when baseline is green; the migrating agent (self-administered) on on-touch — every in-repo-first migration runs the audit before committing. | factory | `tools/hygiene/audit-machine-specific-content.sh` scans all tracked files (`git ls-files`) for machine-specific patterns: `/Users//`, `/home//`, `C:\Users\`, `C:/Users/`. Excludes: `docs/ROUND-HISTORY.md`, `docs/hygiene-history/**`, `docs/DECISIONS/**`, and the audit script itself. `--list` prints offending files; `--enforce` flips exit 2 on any gap. **Why this row exists:** Aaron 2026-04-23 Otto-27 — *"we can have a machine specific scrubber/lint hygene task for anyting that makes it in by default. just run on a cadence."* Following the Option D in-repo-first policy shift (per-user memory migrations to in-repo became the default), machine-specific content leakage becomes a real risk — content comfortably per-user now crosses the factory's public repo boundary. Baseline at first fire (2026-04-23) was 9 gaps: `/Users/` patterns in several SKILL.md files, 2 PDFs (metadata scan), a scratch-recon doc, a parallel-worktree research doc; `C:\Users\` pattern in 1 SKILL.md + `memory/feedback_path_hygiene.md` (anti-example reference — legitimate). **Classification (row #47):** **prevention-bearing** — the audit runs at author-time (on-touch during in-repo-first migrations) and surfaces the gap before it lands. Ships to project-under-construction: adopters inherit the audit + pattern list + exclusion-list discipline. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/machine-specific-content-audit-history.md` (per-fire schema per row #44 — date, agent, gaps count, files touched, actions taken, next-fire-expected-date); BACKLOG row per gap at triage time if cleanup doesn't fit on-touch. | `tools/hygiene/audit-machine-specific-content.sh` (detection + pattern list + exclusion list) + cross-refs: `memory/feedback_path_hygiene.md` + `memory/CURRENT-aaron.md` + `memory/CURRENT-amara.md` (in-repo-first migration boundary surfaces this audit's need) | | 56 | MD032 plus-at-line-start preflight audit (detects prose-continuation `+` followed by space that markdownlint misparses as list items) | Detect-only (landed 2026-04-24); on-touch when author edits markdown; round-cadence sweep + `--enforce` flip when baseline is green. | Dejan (devops-engineer) on cadenced + enforce-transition; author of markdown change self-administered on-touch. | factory | `tools/hygiene/audit-md032-plus-linestart.sh` scans tracked `.md` files for CommonMark-style plus-then-space list-marker lines (regex `^ {0,3}\+` followed by a single space: up to 3 leading spaces allowed, then `+`, then space) where the previous line is non-blank AND is not itself a plus-then-space marker line (so contiguous plus-space lists are not flagged). Whitespace-normalisation on the predecessor-blank check strips all whitespace classes (spaces, tabs, CR) via `[[:space:]]`, so tab-only separator lines count as blank. Path iteration uses NUL-delimited `git ls-files -z` piped into a `while read -d ''` loop and the script runs `cd` to `git rev-parse --show-toplevel` first, so paths resolve from repo root regardless of working directory. Excludes `docs/ROUND-HISTORY.md`, `docs/hygiene-history/**`, `docs/DECISIONS/**`, and self. The `--list` flag prints offending `file:lineno`; `--enforce` flips exit 2 on gap. **Why this row exists:** Otto-session 2026-04-23 hit MD032 regressions three times (Otto-35 + Otto-38 + Otto-38-again). The pattern is author-friendly in intent (prose continuation using `+`) but markdownlint-hostile (parsed as list item). Author-time detection prevents the full CI round-trip. Baseline at first fire (2026-04-24, post review-drain revision on PR #204) was ~170 gaps at repo scope — the CommonMark-aware rewrite removed the earlier file-level-skip heuristic (which masked false negatives when a file used `+` as its bullet style but still contained a prose-continuation `+`) in favour of per-line contiguous-list detection. **Classification (row #47):** **prevention-bearing** — audit runs at author-time (on-touch) and surfaces gap before commit. Ships to project-under-construction: adopters inherit audit + pattern + exclusion discipline. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/md032-plus-linestart-audit-history.md` (per-fire schema per row #44); author-time gap lands as fix-at-source (opportunistic). | `tools/hygiene/audit-md032-plus-linestart.sh` + this row's self-reference | diff --git a/docs/pr-preservation/243-drain-log.md b/docs/pr-preservation/243-drain-log.md new file mode 100644 index 00000000..0478c58e --- /dev/null +++ b/docs/pr-preservation/243-drain-log.md @@ -0,0 +1,182 @@ +# PR #243 drain log — archive-header lint v0 (Amara 5th-ferry Artifact C) + +PR: +Branch: `artifact-c/tools-alignment-archive-header-lint` +Drain session: 2026-04-24 (loop-agent, drain subagent) +Thread count at drain start: 7 unresolved (1 chatgpt-codex-connector P2 + 6 copilot-pull-request-reviewer) +Rebase context: clean rebase onto `origin/main`; no conflicts. + +Per Otto-250 (PR review comments + responses + resolutions are +high-quality training signals): full per-thread record with +verbatim reviewer text, outcome, verbatim reply, and resolution +status. + +--- + +## Thread 1 — `tools/alignment/audit_archive_headers.sh:116` — Recursive archive scan + +- Reviewer: chatgpt-codex-connector +- Thread ID: `PRRT_kwDOSF9kNM59RpBI` +- Severity: P2 + +### Original comment (verbatim) + +> P2: Scan archive Markdown files recursively +> +> The audit currently limits discovery to `-maxdepth 1`, so it ignores any `*.md` files in subdirectories under the target path. That contradicts the stated scope (`docs/aurora/**/*.md` in row #60) and creates a false-negative path where nested absorb docs can miss required headers while the tool still reports success. This matters as soon as archive files are organized into dated/topic subfolders or when `--path` points to a tree with nested docs. + +### Outcome — FIX + +The reviewer is correct. Documented scope (`docs/aurora/**/*.md`) and FACTORY-HYGIENE row #60 wording both implied recursive coverage; the script's `-maxdepth 1` flag silently constrained that. Replaced with a recursive `find` that excludes a `references/` subfolder by convention (bibliographic substrate, not absorb content). Header comment updated to make recursion explicit; `--out` per-file JSON path now encodes subdirectory in the basename (slash → `__`) so a recursive scan over nested folders does not collide. + +### Resolution + +Reply text: "Fixed in this push. Scan is now recursive over `**/*.md` (excluding a `references/` convention path); header comment names the recursive coverage explicitly; `--out` filenames encode subdirectory to avoid basename collision. Verified with `bash tools/alignment/audit_archive_headers.sh` — 16 files now scanned (was 16 in the flat baseline, will scale as nested folders land)." + +--- + +## Thread 2 — `tools/alignment/audit_archive_headers.sh:119` — Recursive find suggestion + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59RrB1` +- Severity: P2 (suggestion-shape) + +### Original comment (verbatim) + +> The header/comments say this checks `docs/aurora/**/*.md`, but the implementation collects files with `find ... -maxdepth 1`, which only checks `*.md` directly under the directory. Either make the scan recursive (and update output naming to avoid basename collisions) or update the documentation/strings to match the non-recursive behavior. [code suggestion] + +### Outcome — FIX + +Same root cause as Thread 1; resolved together. Recursive find now in place, basename-collision handled in `--out` mode by encoding subdirectory. Thread 1 captures the verbatim suggestion-text; this resolution rolls in. + +### Resolution + +Reply text: "Fixed in this push (rolls in with Thread 1 resolution). Scan is recursive; `--out` filenames encode subdir path to avoid collision; documented scope and behaviour now match." + +--- + +## Thread 3 — `tools/alignment/audit_archive_headers.sh:58` — Reference path + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59RrB-` +- Severity: P2 + +### Original comment (verbatim) + +> This reference path doesn’t exist in the repo (`docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md`). Please fix the citation to the actual document path (or add the referenced doc if it’s intended to land in this PR). + +### Outcome — FIX (compose with Otto-220 cleanup) + +Verified the referenced file DOES exist at `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` on `origin/main` (landed pre-PR-243). Reviewer is incorrect on the existence claim. However, the same comment block was carrying persona names ("Aminata") in a `tools/**` source comment, which violates Otto-220 (code comments explain code, not factory process / persona names). Refactored the entire "Reference:" block to drop persona names and reframe the reference as a threat-model context pointer (PR #241), which is the legitimate code-comment shape. + +### Resolution + +Reply text: "Verified on `origin/main`: `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` does exist (landed in PR #241 before this PR opened). Reviewer's existence claim is incorrect. However the surrounding comment was carrying persona names which violates Otto-220 (code comments explain code, not factory process). Refactored the entire reference block to use role-ref ('threat-model reviewer') and dropped the persona name; the threat-model research note is now cited via PR #241 only. Resolving." + +--- + +## Thread 4 — `tools/alignment/audit_archive_headers.sh:22` — Personal name in code comment + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59RrCE` +- Severity: P2 + +### Original comment (verbatim) + +> This comment uses the human maintainer’s personal name (“Aaron”). Repo standing rule is to use role references in code/docs (e.g., “human maintainer”) and keep personal names only in exempt surfaces (see `docs/AGENT-BEST-PRACTICES.md` “No name attribution…”). Please rewrite these references accordingly. [code suggestion] + +### Outcome — FIX + +Reviewer is correct. Replaced "Aaron signoff" with "human-maintainer signoff" and "Aminata's Otto-80" with "the threat-model reviewer flagged" per the FACTORY-DISCIPLINE name-attribution rule (`docs/FACTORY-DISCIPLINE.md` § "name-attribution role references"). + +### Resolution + +Reply text: "Fixed in this push. Replaced 'Aaron signoff' with 'human-maintainer signoff' and dropped the persona name (Aminata) in favour of the role reference 'threat-model reviewer'. Aligns with the name-attribution discipline in `docs/FACTORY-DISCIPLINE.md`." + +--- + +## Thread 5 — `docs/FACTORY-HYGIENE.md:100` — Cross-ref / glob mismatch in row #60 + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59RrCN` +- Severity: P2 + +### Original comment (verbatim) + +> This new row has multiple cross-reference / contract mismatches: it cites `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` and `docs/aurora/2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md`, but neither file exists in the repo. It also says `docs/aurora/**/*.md` while the script currently scans `docs/aurora/*.md` (non-recursive). Please fix the file paths and align the glob wording with the actual tool behavior. [code suggestion] + +### Outcome — FIX + +Two-of-three reviewer claims hold; one is incorrect: + +1. `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` — EXISTS on `origin/main` (verified). Reviewer wrong; kept the citation (now framed without persona name). +2. `docs/aurora/2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md` — does NOT exist in the repo (no file matching `*ksk-aurora-validation*` or `*5th-ferry*` under `docs/aurora/`). Reviewer correct. Removed the dead reference. +3. Glob-vs-behaviour mismatch — addressed by Threads 1+2 (script now scans `**/*.md` recursively). Updated row text to match. + +Also dropped persona-name lead-ins ("Aminata's Otto-80 finding", "(Aminata analysis...)") per FACTORY-DISCIPLINE name-attribution rule (FACTORY-HYGIENE rows are factory-authored docs, not history files; role references apply). + +### Resolution + +Reply text: "Fixed in this push. (1) The aminata-threat-model-5th-ferry citation does exist on `origin/main`; kept the path (reframed without persona name). (2) The `2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md` citation does NOT exist in the repo — removed it. (3) Glob-vs-behaviour mismatch now closed: script scans `**/*.md` recursively (Threads 1+2), row text updated to match. Also reframed the row to drop persona names per FACTORY-DISCIPLINE." + +--- + +## Thread 6 — `docs/FACTORY-HYGIENE.md:102` — Personal name in row #60 + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59RrCX` +- Severity: P2 + +### Original comment (verbatim) + +> This row introduces the human maintainer’s personal name (“Aaron”). Repo standing rule is to use role references in code/docs (e.g., “human maintainer”) and keep personal names only in exempt surfaces (see `docs/AGENT-BEST-PRACTICES.md`). Please rewrite to role language (e.g., “human maintainer sign-off”). + +### Outcome — FIX + +Reviewer is correct. Replaced "Aaron signs off" with "the human maintainer signs off" in row #60. Same edit also dropped persona names (Aminata, Amara) in favour of role references per the FACTORY-DISCIPLINE name-attribution rule. + +### Resolution + +Reply text: "Fixed in this push. Replaced 'Aaron signs off' with 'the human maintainer signs off'; persona names (Aminata, Amara) reframed to role references ('threat-model reviewer', 'the absorbing agent', 'Amara's 5th-ferry Artifact C proposal' kept as a research-output reference)." + +--- + +## Thread 7 — `tools/alignment/audit_archive_headers.sh:35` — Exit-code contract divergence + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59RrCh` +- Severity: P2 + +### Original comment (verbatim) + +> Exit-code contract here diverges from the other `tools/alignment/audit_*.sh` scripts (they use exit `2` for script/dependency errors). Since this file claims “same shape as” siblings, consider aligning error exits to `2` (and/or adding a trap so unexpected failures also map to the documented code). + +### Outcome — FIX + +Reviewer is correct. Verified sibling scripts (`audit_personas.sh`, `audit_commit.sh`): convention is `1` = content-level signal (under `--enforce` / `--gate`), `2` = script error / missing dependency / bad arg. Previous `audit_archive_headers.sh` had `2` for content-signal and `64` for script-error — backwards. Realigned: + +- Script-error exits (bad args, missing path): `64` → `2` +- Content-signal exit (under `--enforce` with gaps): `2` → `1` +- Header doc-block updated to document the new contract and explicitly note the alignment with sibling scripts. + +Also updated FACTORY-HYGIENE row #60 wording from "exit-2 in CI" / "flips exit 2 on any gap" to "exit-1 in CI" / "exit 1 on any gap (content-level signal), exit 2 on script error" to match. + +### Resolution + +Reply text: "Fixed in this push. Realigned exit codes to sibling-script convention: 1 = content-level signal (gaps under --enforce), 2 = script error / missing dep / bad arg. Updated the script's exit-code header block to call out the alignment, and updated FACTORY-HYGIENE row #60 to match. Verified with smoke tests: clean run = 0, --enforce-with-gaps = 1, bad --path = 2." + +--- + +## Drain summary + +- Threads at start: 7 unresolved +- Threads at end: 0 unresolved +- Outcomes: 7 FIX (Threads 1–7) +- Files touched in drain commit: `tools/alignment/audit_archive_headers.sh`, `docs/FACTORY-HYGIENE.md`, `docs/pr-preservation/243-drain-log.md` +- Compose notes: + - Threads 1+2 fixed together (same recursive-scan root cause). + - Threads 3+4 fixed together (Otto-220 code-comment cleanup, plus reviewer's existence claim corrected for thread 3 with verbatim record). + - Threads 5+6 fixed together in the FACTORY-HYGIENE row #60 edit (cross-ref fix + glob alignment + name-attribution). + - Thread 7 cascaded into a small FACTORY-HYGIENE wording update so the row's exit-code wording matches the script. +- Build-gate: not relevant (bash + markdown only; no .NET surface touched). +- Smoke test: `bash tools/alignment/audit_archive_headers.sh` (16 files scanned, exit 0); `--enforce` (exit 1 with gaps); `--path no-such-dir` (exit 2); `--json` (exit 0); `--out tmpdir` (16 JSON files, no basename collisions). diff --git a/tools/alignment/audit_archive_headers.sh b/tools/alignment/audit_archive_headers.sh index 6bfcde73..f4083304 100755 --- a/tools/alignment/audit_archive_headers.sh +++ b/tools/alignment/audit_archive_headers.sh @@ -14,9 +14,9 @@ # # The tool is deliberately *detect-only* at v0. Running # `--enforce` makes it exit non-zero on any missing header, -# but CI does not currently call that flag — Aminata's Otto-80 -# threat-model pass flagged proposed §33 as IMPORTANT-not- -# CRITICAL pending Aaron signoff on the governance edit +# but CI does not currently call that flag — the threat-model +# reviewer flagged proposed §33 as IMPORTANT-not-CRITICAL +# pending human-maintainer signoff on the governance edit # itself. This tool is the mechanism that will back §33 if / # when it lands; it also provides detect-only signal today # so drift is visible before enforcement. @@ -30,12 +30,19 @@ # # Exit codes: # 0 All archive docs have all four headers (or --enforce unset). -# 2 One or more archive docs missing header(s) and --enforce set. -# 64 Script error / missing dependency / bad args. +# 1 One or more archive docs missing header(s) and --enforce set. +# 2 Script error / missing dependency / bad args. +# +# Exit-code shape matches sibling `tools/alignment/audit_*.sh` +# scripts: `1` is a content-level signal (under --enforce / --gate), +# `2` is a script-error / dependency-missing / bad-arg signal. # # Scope: -# - Default path: `docs/aurora/` — every `.md` file is treated -# as archive-of-external-conversation and checked. +# - Default path: `docs/aurora/` — every `.md` file under that +# tree (recursive; `**/*.md`) is treated as archive-of-external- +# conversation and checked. A `references/` subfolder is +# excluded by convention because it is bibliographic substrate, +# not absorb content. # - `--path DIR` overrides to check a different archive root # (e.g. `docs/research/` would apply only if research docs # were the scope; v0 leaves this for explicit opt-in). @@ -43,19 +50,20 @@ # Not in scope (v0): # - Content-level validation of header values. A doc with # `Scope: research` as prose in paragraph 3 technically -# passes; this is the partial-header-adversary Aminata -# flagged. Harden via syntactic requirement (header must -# appear in the first N lines + as a definition-list item -# or bold label) in a follow-up. +# passes; this is the partial-header adversary flagged in +# the threat-model review. Harden via syntactic requirement +# (header must appear in the first N lines + as a +# definition-list item or bold label) in a follow-up. # - Cross-repo checks (KSK / lucent-ksk cross-references). # - Memory-file archive-header checks. Memory lives under -# `~/.claude/projects//memory/` (per-user, not -# in-repo). In-repo `memory/` is a different surface; this -# tool does not assume it covers archive content. +# the per-user harness path (not in-repo). In-repo +# `memory/` is a different surface; this tool does not +# assume it covers archive content. # -# Reference: `docs/research/aminata-threat-model-5th-ferry-governance-edits-2026-04-23.md` -# (PR #241) — Aminata's analysis of why this lint matters for -# §33 not to decay within 3-5 rounds. +# Threat-model context for the §33 decay-without-lint risk +# lives in the threat-model reviewer's research note +# (see PR #241). This script is the lint-companion that +# closes that risk. set -euo pipefail @@ -74,13 +82,13 @@ while [[ $# -gt 0 ]]; do --path) if [[ -z "${2:-}" ]]; then echo "audit_archive_headers: --path requires a directory" >&2 - exit 64 + exit 2 fi target_path="$2"; shift 2 ;; --out) if [[ -z "${2:-}" ]]; then echo "audit_archive_headers: --out requires a directory" >&2 - exit 64 + exit 2 fi out_dir="$2"; shift 2 ;; -h|--help) @@ -88,13 +96,13 @@ while [[ $# -gt 0 ]]; do exit 0 ;; *) echo "audit_archive_headers: unknown arg: $1" >&2 - exit 64 ;; + exit 2 ;; esac done if [[ ! -d "$target_path" ]]; then echo "audit_archive_headers: target path not found: $target_path" >&2 - exit 64 + exit 2 fi # The four required headers. Each pattern matches the label in @@ -107,16 +115,20 @@ declare -a HEADER_LABELS=( 'Non-fusion disclaimer:' ) -# Collect archive files (ASCII sort for stable output). +# Collect archive files recursively (ASCII sort for stable +# output). Recursive scan matches the documented scope +# (`docs/aurora/**/*.md`) so nested topic / dated subfolders +# are not silently skipped. `references/` is excluded because +# it is the bibliography substrate, not absorb content. # Use a while-read loop instead of mapfile so the tool runs on # bash 3.2 (macOS default) as well as bash 4+. archive_files=() while IFS= read -r f; do archive_files+=("$f") -done < <(find "$target_path" -maxdepth 1 -type f -name '*.md' | sort) +done < <(find "$target_path" -type f -name '*.md' -not -path "$target_path/references/*" | sort) if [[ ${#archive_files[@]} -eq 0 ]]; then - echo "audit_archive_headers: no .md files in $target_path" >&2 + echo "audit_archive_headers: no .md files under $target_path" >&2 exit 0 fi @@ -152,7 +164,11 @@ for file in "${archive_files[@]}"; do if [[ -n "$out_dir" ]]; then # Per-file JSON (same shape as audit_commit.sh / audit_personas.sh). - file_base=$(basename "$file" .md) + # Encode subdirectory path in the output filename so a recursive + # scan over nested folders does not collide on basename. + rel_path="${file#$target_path/}" + file_base="${rel_path%.md}" + file_base="${file_base//\//__}" out_file="$out_dir/${file_base}.json" { echo "{" @@ -195,7 +211,7 @@ fi # Exit code discipline if [[ $files_missing_headers -gt 0 ]] && $enforce; then - exit 2 + exit 1 fi exit 0 From ce380a407b690ed6a9b75320ddbbb56956ddd0d3 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 20:39:37 -0400 Subject: [PATCH 3/4] drain(#243): quote target_path inside parameter expansion (SC2295) Local shellcheck install only flagged this on the lint runner with --severity=style. Quote $target_path inside the ${file#...} prefix-strip so the prefix is not interpreted as a glob pattern. --- tools/alignment/audit_archive_headers.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/alignment/audit_archive_headers.sh b/tools/alignment/audit_archive_headers.sh index f4083304..919fd9b8 100755 --- a/tools/alignment/audit_archive_headers.sh +++ b/tools/alignment/audit_archive_headers.sh @@ -166,7 +166,7 @@ for file in "${archive_files[@]}"; do # Per-file JSON (same shape as audit_commit.sh / audit_personas.sh). # Encode subdirectory path in the output filename so a recursive # scan over nested folders does not collide on basename. - rel_path="${file#$target_path/}" + rel_path="${file#"$target_path"/}" file_base="${rel_path%.md}" file_base="${file_base//\//__}" out_file="$out_dir/${file_base}.json" From c986d6ce39b088f26ce07e297256abdfa9d44507 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 20:48:41 -0400 Subject: [PATCH 4/4] =?UTF-8?q?drain:=20PR=20#243=20round=202=20=E2=80=94?= =?UTF-8?q?=20address=206=20late-review=20threads?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 2 drain after round 1 closed all 7 threads. Copilot re-reviewed and opened 6 new P2 suggestion-shape threads; all 6 are FIX outcomes: - r2-1 (line 128): normalise --path to strip trailing slash so `docs/aurora/` matches the references/ exclusion. - r2-2 (line 172): make --out filename encoding injective by percent-encoding literal `_` to `_5F` before the `/` -> `__` swap. Was non-injective: `a/b__c.md` and `a__b/c.md` both became `a__b__c.json`. - r2-3 (line 26): fix stale Usage wording — `--enforce` exits 1 on gap (matches the dedicated Exit-codes section and round-1 Thread-7 realignment). - r2-4 (line 61): correct factual error about memory surface — in-repo `memory/` is canonical per GOVERNANCE.md §18 and `memory/README.md`; per-user path is staging. - r2-5 (line 128): force C-locale sort with `LC_ALL=C` for deterministic byte-order output regardless of caller env. - r2-6 (line 7): drop persona name "Amara" from header banner in favour of role/artifact references ("5th-ferry Artifact C" / "the 5th-ferry external- research absorb"). Round 1 caught "Aaron" but missed "Amara". Append-only drain-log update per Otto-229: prior round-1 sections untouched; new "Drain pass: 2026-04-24 (round 2 — 6 threads)" section appended. Co-Authored-By: Claude Opus 4.7 --- docs/pr-preservation/243-drain-log.md | 213 +++++++++++++++++++++++ tools/alignment/audit_archive_headers.sh | 49 ++++-- 2 files changed, 250 insertions(+), 12 deletions(-) diff --git a/docs/pr-preservation/243-drain-log.md b/docs/pr-preservation/243-drain-log.md index 0478c58e..d7cf138a 100644 --- a/docs/pr-preservation/243-drain-log.md +++ b/docs/pr-preservation/243-drain-log.md @@ -180,3 +180,216 @@ Reply text: "Fixed in this push. Realigned exit codes to sibling-script conventi - Thread 7 cascaded into a small FACTORY-HYGIENE wording update so the row's exit-code wording matches the script. - Build-gate: not relevant (bash + markdown only; no .NET surface touched). - Smoke test: `bash tools/alignment/audit_archive_headers.sh` (16 files scanned, exit 0); `--enforce` (exit 1 with gaps); `--path no-such-dir` (exit 2); `--json` (exit 0); `--out tmpdir` (16 JSON files, no basename collisions). + +--- + +## Drain pass: 2026-04-24 (round 2 — 6 threads) + +After round-1 closed all 7 threads to 0, copilot-pull-request-reviewer +re-reviewed and opened 6 new threads (all P2, all suggestion-shape). +This round drains those. Per Otto-229 append-only: prior round-1 +sections are not edited. + +Round 2 thread count at start: 6 unresolved (all copilot-pull-request-reviewer). +Rebase context: clean rebase onto `origin/main`; no conflicts. + +--- + +### Thread r2-1 — `tools/alignment/audit_archive_headers.sh:128` — `--path` trailing-slash breaks `references/` exclusion + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59ihXH` +- Severity: P2 + +#### Original comment (verbatim) + +> Normalize `--path` before applying `references/` exclusion +> +> The `find` exclusion hardcodes `-not -path "$target_path/references/*"`, which fails when callers pass a trailing slash (for example `--path docs/aurora/`), because the pattern becomes `docs/aurora//references/*` and no longer matches discovered paths. In that common invocation shape, files under `references/` are audited even though the script contract says they are excluded, which can introduce false positives (and enforcement failures) from bibliographic substrate. + +#### Outcome — FIX + +Reviewer is correct. Added a normalisation step right after `--path` +arg parsing that strips trailing slashes from `target_path` (preserving +root `/` if ever passed). The `find -not -path` pattern now matches +whether the caller passes `docs/aurora` or `docs/aurora/`. Verified +with `--path docs/aurora/` smoke test (exit 0, same file count as +without trailing slash). + +#### Resolution + +Reply text: "Fixed in this push. Added a trailing-slash normalisation step on `target_path` after arg parsing (preserves root `/`). The `-not -path "$target_path/references/*"` pattern now matches whether the caller passes `docs/aurora` or `docs/aurora/`. Smoke-tested with both forms — same file count, same exclusion behaviour. Resolving." + +--- + +### Thread r2-2 — `tools/alignment/audit_archive_headers.sh:172` — `--out` filename collision + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59ihXJ` +- Severity: P2 + +#### Original comment (verbatim) + +> Generate collision-safe `--out` filenames +> +> Replacing `/` with `__` is not injective, so distinct source files can map to the same output JSON path (e.g., `a/b__c.md` and `a__b/c.md` both become `a__b__c.json`). This silently overwrites one audit result with another, corrupting the per-file artifact set and potentially hiding missing-header gaps from downstream consumers that rely on `--out` files. + +#### Outcome — FIX + +Reviewer is correct — the round-1 slash-encoding was non-injective. +Made the encoding injective by percent-encoding any literal `_` to +`_5F` first (so the byte `_` never appears in the encoded form), then +mapping path separator `/` to `__`. Now `a/b__c.md` → +`a_5F_5Fb__c.json` and `a__b/c.md` → `a__b_5F_5Fc.json`, which are +distinct. Verified with a synthetic two-file collision test +(`/tmp/aud-collide-test/refs/a/b__c.md` and `.../a__b/c.md`) — got +two distinct JSON outputs as expected. + +#### Resolution + +Reply text: "Fixed in this push. Made the slash-encoding injective: first percent-encode literal `_` → `_5F` (so `_` never appears in the encoded form), then map path separator `/` → `__`. Verified with a synthetic collision test (`a/b__c.md` and `a__b/c.md` under one root) — got two distinct JSON files (`a_5F_5Fb__c.json` and `a__b_5F_5Fc.json`). Header comment updated to document why the round-1 encoding was insufficient. Resolving." + +--- + +### Thread r2-3 — `tools/alignment/audit_archive_headers.sh:26` — Usage line wrong exit code + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59iiWx` +- Severity: P2 + +#### Original comment (verbatim) + +> Usage comment says `--enforce` will "exit 2 on gap", but the script's documented exit codes and implementation use exit 1 for content-level gaps (and 2 for script errors). Please update the usage line to match the actual exit-code contract to avoid confusing callers/CI wiring. +> +> ```suggestion +> # tools/alignment/audit_archive_headers.sh --enforce # exit 1 on gap +> ``` + +#### Outcome — FIX + +Reviewer is correct. This is a stale wording from before the round-1 +exit-code realignment (Thread 7 in round 1 flipped content-signal from +2 → 1 to match sibling scripts). The Exit-codes section below it is +correct (lines 31-38: `1 = content-level signal`); only the Usage line +hadn't been updated. Applied the suggestion verbatim. + +#### Resolution + +Reply text: "Fixed in this push (applied the suggestion verbatim). Stale wording from before round 1's exit-code realignment; the dedicated Exit-codes section was already correct, only the Usage line needed updating. Now consistent: `--enforce` exits 1 on content gap, 2 on script error. Resolving." + +--- + +### Thread r2-4 — `tools/alignment/audit_archive_headers.sh:61` — Memory comment misstates canonical surface + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59iiW6` +- Severity: P2 + +#### Original comment (verbatim) + +> Comment claims memory "lives under the per-user harness path (not in-repo)", but this repo's canonical agent memory is explicitly `memory/` in-repo (see `memory/README.md`). Please reword this bullet to reflect that `memory/` is in-repo but intentionally out-of-scope for this audit (and optionally distinguish in-repo vs per-user memory if that's what you meant). +> +> ```suggestion +> # - Memory-file archive-header checks. The repo's canonical +> # agent memory surface is in-repo `memory/`, but this +> # audit intentionally does not cover that surface or +> # any separate per-user harness-local memory pat +> ``` + +#### Outcome — FIX + +Reviewer is correct, and this is a real factual error. Per +`GOVERNANCE.md` §18 and `memory/README.md`, the in-repo `memory/` +tree IS the canonical agent memory surface; the per-user harness +path is staging that syncs INTO the in-repo store. Round-1's wording +inverted that. Reworded to: (1) name the in-repo `memory/` surface as +canonical with citations to `memory/README.md` and `GOVERNANCE.md` +§18, (2) explain why memory files are out-of-scope (they use a +different discipline — index + per-fact files, not archive content), +(3) acknowledge the per-user staging path exists but is also out of +scope. Reviewer's suggestion-shape was applied with light expansion +to add the citations and explain the discipline difference. + +#### Resolution + +Reply text: "Fixed in this push. Reviewer is correct — the round-1 wording got the canonical-vs-staging direction backwards (per `GOVERNANCE.md` §18 and `memory/README.md`, in-repo `memory/` IS canonical; the per-user harness path is staging that syncs INTO the in-repo store). Reworded the bullet to: (1) name in-repo `memory/` as canonical with citations, (2) explain why memory files are out-of-scope here (different discipline — index + per-fact files, not archive content), (3) acknowledge the per-user staging path is also out of scope. Applied the reviewer's suggestion-shape with light expansion. Resolving." + +--- + +### Thread r2-5 — `tools/alignment/audit_archive_headers.sh:128` — `sort` is locale-dependent + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59iiXC` +- Severity: P2 + +#### Original comment (verbatim) + +> The comment says the file list is "ASCII sort for stable output", but `sort` is locale-dependent unless `LC_ALL=C` (or similar) is set. Either force a C locale for determinism (recommended if output is committed/compared) or reword the comment to avoid promising ASCII ordering. +> +> ```suggestion +> done < <(find "$target_path" -type f -name '*.md' -not -path "$target_path/references/*" | LC_ALL=C sort) +> ``` + +#### Outcome — FIX + +Reviewer is correct. Forced C-locale sort by prepending `LC_ALL=C` to +the `sort` invocation in the pipeline. Updated the comment from +"ASCII sort for stable output" to "forced C-locale sort for byte-order +stable output regardless of LANG/LC_ALL in the caller env" so the +promise matches the implementation. Applied the suggestion verbatim +on the code line; comment expanded. + +#### Resolution + +Reply text: "Fixed in this push. Forced C-locale sort by prepending `LC_ALL=C` to the `sort` invocation (applied the suggestion verbatim on the code line). Updated the surrounding comment from 'ASCII sort for stable output' to 'forced C-locale sort for byte-order stable output regardless of LANG/LC_ALL in the caller env' so the documentation matches the implementation. Resolving." + +--- + +### Thread r2-6 — `tools/alignment/audit_archive_headers.sh:7` — Persona name "Amara" in header + +- Reviewer: copilot-pull-request-reviewer +- Thread ID: `PRRT_kwDOSF9kNM59iiXE` +- Severity: P2 + +#### Original comment (verbatim) + +> This header comment still includes a contributor name ("Amara"). Repo standing rule is to avoid name attribution in code/docs and use role/artifact references instead (see `docs/AGENT-BEST-PRACTICES.md` "No name attribution…"). Suggest rewording to something like "5th-ferry Artifact C" / "external ferry proposal" without the person's name. +> +> ```suggestion +> # discipline lint (5th-ferry Artifact C, detect-only v0). +> # +> # Checks every `docs/aurora/**/*.md` absorb doc for the four +> # archive-header fields proposed in the 5th-ferry proposal +> ``` + +#### Outcome — FIX + +Reviewer is correct. Round-1 cleanup caught the "Aaron" personal-name +references but missed the persona name "Amara" still in the header +banner ("Amara 5th-ferry Artifact C") and elsewhere ("Amara's 5th +ferry"). Applied the suggestion's spirit: replaced "Amara 5th-ferry +Artifact C" with "5th-ferry Artifact C" and "Amara's 5th ferry" with +"the 5th-ferry external-research absorb". Aligns with the +name-attribution discipline in `docs/FACTORY-DISCIPLINE.md` +("name-attribution role references" section) and `docs/AGENT-BEST-PRACTICES.md`. + +#### Resolution + +Reply text: "Fixed in this push. Round 1 caught 'Aaron' but missed 'Amara' in the header banner. Replaced 'Amara 5th-ferry Artifact C' → '5th-ferry Artifact C' and 'Amara's 5th ferry' → 'the 5th-ferry external-research absorb'. Aligns with the name-attribution discipline in `docs/FACTORY-DISCIPLINE.md` and `docs/AGENT-BEST-PRACTICES.md`. Resolving." + +--- + +## Round 2 drain summary + +- Threads at start: 6 unresolved +- Threads at end: 0 unresolved (target) +- Outcomes: 6 FIX (r2-1 through r2-6) +- Files touched in round-2 drain commit: `tools/alignment/audit_archive_headers.sh`, `docs/pr-preservation/243-drain-log.md` +- Compose notes: + - r2-1 + r2-5 both touched line 128 region; landed together (trailing-slash normalisation + `LC_ALL=C` sort). + - r2-2 isolated (`--out` collision fix, lines ~165-188). + - r2-3 + r2-6 both touched the header doc-block (lines 1-26); landed together. + - r2-4 isolated (memory-comment correction, lines ~58-66). + - All fixes are local to one source file plus the drain-log append. +- Build-gate: not relevant (bash + markdown only; no .NET surface touched). +- Smoke test: `bash tools/alignment/audit_archive_headers.sh` (18 files scanned, exit 0); `--enforce` (exit 1 with gaps); `--path no-such-dir` (exit 2); `--path docs/aurora/` with trailing slash (exit 0, same file count as without slash); synthetic two-file collision test under `--out` produced two distinct JSON files. diff --git a/tools/alignment/audit_archive_headers.sh b/tools/alignment/audit_archive_headers.sh index 919fd9b8..0142046b 100755 --- a/tools/alignment/audit_archive_headers.sh +++ b/tools/alignment/audit_archive_headers.sh @@ -1,11 +1,11 @@ #!/usr/bin/env bash # # tools/alignment/audit_archive_headers.sh — archive-header -# discipline lint (Amara 5th-ferry Artifact C, detect-only v0). +# discipline lint (5th-ferry Artifact C, detect-only v0). # # Checks every `docs/aurora/**/*.md` absorb doc for the four -# archive-header fields proposed in Amara's 5th ferry -# (§33 candidate, PR #235 absorb): +# archive-header fields proposed in the 5th-ferry external- +# research absorb (§33 candidate, PR #235 absorb): # # Scope: research / cross-review / archival purpose # Attribution: speaker labels preserved @@ -23,7 +23,7 @@ # # Usage: # tools/alignment/audit_archive_headers.sh # detect-only -# tools/alignment/audit_archive_headers.sh --enforce # exit 2 on gap +# tools/alignment/audit_archive_headers.sh --enforce # exit 1 on gap # tools/alignment/audit_archive_headers.sh --path DIR # custom path # tools/alignment/audit_archive_headers.sh --json # JSON output # tools/alignment/audit_archive_headers.sh --out DIR # per-file JSON @@ -55,10 +55,15 @@ # (header must appear in the first N lines + as a # definition-list item or bold label) in a follow-up. # - Cross-repo checks (KSK / lucent-ksk cross-references). -# - Memory-file archive-header checks. Memory lives under -# the per-user harness path (not in-repo). In-repo -# `memory/` is a different surface; this tool does not -# assume it covers archive content. +# - Memory-file archive-header checks. The repo's canonical +# agent memory surface is in-repo `memory/` (see +# `memory/README.md` and `GOVERNANCE.md` §18), but this +# audit intentionally does not cover that surface. Memory +# files use their own discipline (canonical index + +# per-fact files); they are not archive-of-external- +# conversation content. A separate per-user harness-local +# staging path also exists out-of-tree but is not in scope +# for this lint either. # # Threat-model context for the §33 decay-without-lint risk # lives in the threat-model reviewer's research note @@ -100,6 +105,16 @@ while [[ $# -gt 0 ]]; do esac done +# Normalise target_path: strip any trailing slashes so the +# downstream `-not -path "$target_path/references/*"` pattern +# matches whether the caller passed `docs/aurora` or +# `docs/aurora/`. Without this, `docs/aurora//references/*` +# fails to match anything, and the references/ exclusion +# silently breaks. +while [[ "$target_path" == */ && "$target_path" != "/" ]]; do + target_path="${target_path%/}" +done + if [[ ! -d "$target_path" ]]; then echo "audit_archive_headers: target path not found: $target_path" >&2 exit 2 @@ -115,8 +130,9 @@ declare -a HEADER_LABELS=( 'Non-fusion disclaimer:' ) -# Collect archive files recursively (ASCII sort for stable -# output). Recursive scan matches the documented scope +# Collect archive files recursively (forced C-locale sort for +# byte-order stable output regardless of LANG/LC_ALL in the +# caller env). Recursive scan matches the documented scope # (`docs/aurora/**/*.md`) so nested topic / dated subfolders # are not silently skipped. `references/` is excluded because # it is the bibliography substrate, not absorb content. @@ -125,7 +141,7 @@ declare -a HEADER_LABELS=( archive_files=() while IFS= read -r f; do archive_files+=("$f") -done < <(find "$target_path" -type f -name '*.md' -not -path "$target_path/references/*" | sort) +done < <(find "$target_path" -type f -name '*.md' -not -path "$target_path/references/*" | LC_ALL=C sort) if [[ ${#archive_files[@]} -eq 0 ]]; then echo "audit_archive_headers: no .md files under $target_path" >&2 @@ -165,9 +181,18 @@ for file in "${archive_files[@]}"; do if [[ -n "$out_dir" ]]; then # Per-file JSON (same shape as audit_commit.sh / audit_personas.sh). # Encode subdirectory path in the output filename so a recursive - # scan over nested folders does not collide on basename. + # scan over nested folders does not collide on basename. The + # encoding must be INJECTIVE so distinct source paths cannot map + # to the same output filename: an earlier slash->'__' replacement + # collided when a literal '__' already appeared in the path + # (e.g. `a/b__c.md` and `a__b/c.md` both became `a__b__c.json`). + # Fix: first percent-encode any literal '_' to '_5F' so the + # `_` byte never appears in the encoded form; then map path + # separator '/' to '__'. The encoding round-trips and is + # collision-free. rel_path="${file#"$target_path"/}" file_base="${rel_path%.md}" + file_base="${file_base//_/_5F}" file_base="${file_base//\//__}" out_file="$out_dir/${file_base}.json" {