diff --git a/docs/FACTORY-HYGIENE.md b/docs/FACTORY-HYGIENE.md index c578cbd0..3c0eec60 100644 --- a/docs/FACTORY-HYGIENE.md +++ b/docs/FACTORY-HYGIENE.md @@ -100,6 +100,7 @@ is never destructive; retiring one requires an ADR in | 58 | Memory-index-integrity CI check (PR/push that adds or modifies `memory/*.md` MUST also update `memory/MEMORY.md` in the same range) | Every pull_request + push-to-main touching `memory/**`; workflow-dispatch manual run available | Automated (`.github/workflows/memory-index-integrity.yml`); human-maintainer or any contributor resolves on fail | factory | Scope triggers: top-level `memory/*.md` add-or-modify (excluding `memory/README.md` and `memory/MEMORY.md` itself, and excluding `memory/persona/**` which has its own lifecycle). Check: if any trigger-qualifying file changed in the PR/push range, `memory/MEMORY.md` MUST also be in that range. Fail message cites NSA-001 (canonical incident: new memory landed without MEMORY.md pointer → undiscoverable from fresh session). Safe-pattern compliant per row #43 (SHA-pinned actions, explicit minimum permissions, no user-authored context interpolation, concurrency group, pinned runs-on). **Why this row exists:** Amara 2026-04-23 decision-proxy + technical review courier report (absorbed as PR #219) — action item #1 in her "10 immediate fixes" list, highest-value by her own ranking. Directly addresses the NSA-001 measured failure mode. **Classification (row #47):** **prevention-bearing** — the check runs at PR author-time, blocks merge before the memory substrate can diverge from its index. Ships to project-under-construction: adopters inherit the workflow unchanged; the `memory/**.md` and `memory/MEMORY.md` conventions are factory-generic. | CI job result + annotated fail message in PR checks + `docs/hygiene-history/memory-index-integrity-fires.md` (per-fire schema per row #44 — optional; CI log is durable for 90 days so fire-history file exists only if the human maintainer wants longer retention) | `.github/workflows/memory-index-integrity.yml` (detection + fail message) + `docs/hygiene-history/nsa-test-history.md` (NSA-001 canonical incident) + `docs/aurora/2026-04-23-amara-decision-proxy-technical-review.md` (ferry with proposal) + FACTORY-HYGIENE row #25 (pointer-integrity audit — covers dangling-pointer from the other direction) | | 55 | Machine-specific content scrubber (cadenced audit of in-repo tracked files for user-home paths, Claude Code harness paths, Windows user-profile paths, hostname leaks) | Detect-only (landed 2026-04-23); cadenced detection once per round-close (same cadence as rows #50 / #51 / #52 meta-audits) + opportunistic on-touch when a tick migrates per-user content to in-repo. Enforcement (`--enforce` exit-2) deferred until baseline is green. | Dejan (devops-engineer) on cadenced detection + CI-enforcement sign-off when baseline is green; the migrating agent (self-administered) on on-touch — every in-repo-first migration runs the audit before committing. | factory | `tools/hygiene/audit-machine-specific-content.sh` scans all tracked files (`git ls-files`) for machine-specific patterns: `/Users//`, `/home//`, `C:\Users\`, `C:/Users/`. Excludes: `docs/ROUND-HISTORY.md`, `docs/hygiene-history/**`, `docs/DECISIONS/**`, and the audit script itself. `--list` prints offending files; `--enforce` flips exit 2 on any gap. **Why this row exists:** Aaron 2026-04-23 Otto-27 — *"we can have a machine specific scrubber/lint hygene task for anyting that makes it in by default. just run on a cadence."* Following the Option D in-repo-first policy shift (per-user memory migrations to in-repo became the default), machine-specific content leakage becomes a real risk — content comfortably per-user now crosses the factory's public repo boundary. Baseline at first fire (2026-04-23) was 9 gaps: `/Users/` patterns in several SKILL.md files, 2 PDFs (metadata scan), a scratch-recon doc, a parallel-worktree research doc; `C:\Users\` pattern in 1 SKILL.md + `memory/feedback_path_hygiene.md` (anti-example reference — legitimate). **Classification (row #47):** **prevention-bearing** — the audit runs at author-time (on-touch during in-repo-first migrations) and surfaces the gap before it lands. Ships to project-under-construction: adopters inherit the audit + pattern list + exclusion-list discipline. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/machine-specific-content-audit-history.md` (per-fire schema per row #44 — date, agent, gaps count, files touched, actions taken, next-fire-expected-date); BACKLOG row per gap at triage time if cleanup doesn't fit on-touch. | `tools/hygiene/audit-machine-specific-content.sh` (detection + pattern list + exclusion list) + cross-refs: `memory/feedback_path_hygiene.md` + `memory/CURRENT-aaron.md` + `memory/CURRENT-amara.md` (in-repo-first migration boundary surfaces this audit's need) | | 60 | Surface-map-drift smell (wrong URL on a mapped surface fires a hygiene alarm) | Pre-call: every `gh api ` (or equivalent platform call) on a surface that has a mapping doc — grep the map first, use its path, otherwise record a map-gap. Post-call: every 410 / 301 / "endpoint moved" response on a mapped endpoint auto-proposes a map-update. Cadenced sweep every 5-10 rounds replays the full set of mapped endpoints against the current platform to catch silent drift (endpoint renamed without 410). | Any agent calling `gh api` (self-administered on pre-call / post-call); Dejan (devops-engineer) on the cadenced sweep; Kenji (Architect) on map-update PRs when drift lands. Bounded to surfaces with a mapping doc under `docs/research/*surface-map*.md` / `docs/AGENT-*-SURFACES.md` / `docs/HARNESS-SURFACES.md` / `docs/GITHUB-SETTINGS.md`. | factory | **Pre-call (prevention-bearing):** before invoking any `gh api` call against org / enterprise / Copilot / billing / settings surfaces, `grep -li "" ` and use the path the map lists. If the map lacks the path, **file a map-gap finding** in the same audit's output — agent may still call a best-guess endpoint if confident the surface exists, but must log the gap so the next round-close sweep extends the map. **Post-call (detection-bearing):** any `410 Gone` / `301 Moved Permanently` / `"endpoint moved"` response from a mapped endpoint triggers a map-update task (write the new path to the map; note old-path + redirect-doc + drift-date in a "Map drift log" section). **Cadenced (detection-bearing):** every 5-10 rounds, replay the full set of mapped endpoints against the current platform to catch silent renames (200 OK from a stale path that silently redirects, or 404 from an endpoint removed without deprecation). **Why this row exists:** Aaron 2026-04-22 after agent invented `/orgs/.../billing/budgets` (404) for LFG budget audit despite task #195 having already produced the complete map: *"i'm supprised you got the url wrong given you mapped it"* + *"that should be a smell when that happen to a surface you already have mapped"*. Same incident revealed a second drift class — `/orgs/{org}/settings/billing/actions` (map §A.17) returned 410 with `documentation_url: https://gh.io/billing-api-updates-org`, meaning GitHub moved the endpoint between 2026-04-22 (map author-time) and 2026-04-22 (this fire, hours later). Two orthogonal failure modes compound: (a) **not-consulting** an existing map (guess without grep), (b) **consulting-but-stale** map (correct path + platform drift). **UI-only surfaces** (e.g., GitHub org budget management at `https://github.com/organizations/{org}/billing/budgets`, no REST equivalent) are legitimate map entries — the map should mark them as `ui-only` so agents know "no API path exists" before trying. **Classification (row #47):** **prevention-bearing** — the pre-call grep discipline is the prevention layer; the post-call 410 handler is a complementary detection layer; the cadenced sweep is the insurance detection layer for silent renames. See `memory/feedback_surface_map_consultation_before_guessing_urls.md`. Ships to project-under-construction: adopters inherit the smell pattern + the pre-call grep obligation + the map-update-on-410 trigger. | Pre-call: grep output shown in the audit (map-hit / map-miss). Post-call: map-update PR when 410/301 lands, with "Map drift log" row recording old-path + redirect-doc + drift-date. Cadenced: sweep output logged to `docs/hygiene-history/surface-map-drift-history.md` (per-fire schema per row #44). ROUND-HISTORY row when a drift resolves. | `memory/feedback_surface_map_consultation_before_guessing_urls.md` (authoritative) + `docs/research/github-surface-map-complete-2026-04-22.md` (primary target for GitHub surfaces) + `docs/AGENT-GITHUB-SURFACES.md` (ten-surface playbook) + `docs/HARNESS-SURFACES.md` + `docs/GITHUB-SETTINGS.md` + this row's enforcement discipline (agent-self-administered pre-call, detection scripts TBD under `tools/hygiene/audit-surface-map-drift.sh`) | +| 57 | Git-hotspots audit (cadenced ranking of high-churn files as friction-point candidates) | Detect-only (landed 2026-04-23); cadenced detection every 5-10 rounds (same cadence as rows #5 / #38 / #46 meta-audits) + opportunistic on-touch when merge conflicts surface on a shared file. No enforcement; detection-first per the *"detection-first, action-second"* framing in the Otto-54 directive cluster. | Dejan (devops-engineer) on cadenced sweeps; Architect (Kenji) on per-file action decisions (split / freeze / archive / watch); all agents (self-administered) on on-touch when a merge conflict surfaces. | factory | `tools/hygiene/audit-git-hotspots.sh --window "60 days" --top 20 --report docs/hygiene-history/git-hotspots-YYYY-MM-DD.md` runs a `git log --since --name-only` pass, counts per-file touches over the window, enriches with unique-author + PR-count columns, ranks top-N, and emits a markdown report. Excludes `docs/hygiene-history/`, `openspec/changes/`, `references/upstreams/` as legitimately-by-design high-churn. **Why this row exists:** the human maintainer 2026-04-23 Otto-54 four-message cluster — *"cadence for checking github hotspots too this is a hygene issues points of friction and bottlenecks, we are frictionless... git hotspots i mean... we are gitnative with github as our first host"*. High-churn shared files are the paradigmatic friction surface (routine merge conflicts, reviewer burden, serialization bottleneck) and git log is the native instrument for detecting them. **First-fire finding (2026-04-23):** `docs/BACKLOG.md` is the top hotspot at 34 touches / 26 PRs in 30 days — effectively one BACKLOG touch per PR opened. The Otto-54 BACKLOG-per-swim-lane split row is the direct remediation. Other notable hotspots: `docs/ROUND-HISTORY.md` (freeze-then-watch), `memory/MEMORY.md` (cadence candidate — Otto-54 CURRENT-freshness row), 4 skill files (audit candidates for skill-tune-up). **Classification (row #50):** **prevention-bearing** — surfacing friction candidates before they compound into routine merge-tangle is upstream prevention, even though the tool itself is detect-only. Ships to project-under-construction: adopters inherit the audit + exclusion-prefix discipline + the per-file-action decision taxonomy. | Audit output in `docs/hygiene-history/git-hotspots-YYYY-MM-DD.md` per fire (per-fire schema per row #44 — date, window, top-N, Otto observations, per-file action, synthesis); BACKLOG rows for `split` / `audit` actions; ROUND-HISTORY row when a file transitions out of hotspot status via split / freeze / archive. | `tools/hygiene/audit-git-hotspots.sh` (detection + exclusion list + ranking) + `docs/hygiene-history/git-hotspots-2026-04-23.md` (first-run baseline with Otto observations) + Otto-54 directive cluster in BACKLOG.md § "P1 — Git-native hygiene cadences" + **out-of-repo** (per-user memory, not yet in-repo) companion memory `feedback/project_factory_is_git_native_github_first_host_hygiene_cadences_for_frictionless_operation_2026_04_23.md` (captures the four-message directive verbatim) | ## Ships to project-under-construction diff --git a/docs/hygiene-history/git-hotspots-2026-04-23.md b/docs/hygiene-history/git-hotspots-2026-04-23.md new file mode 100644 index 00000000..5a2baeb6 --- /dev/null +++ b/docs/hygiene-history/git-hotspots-2026-04-23.md @@ -0,0 +1,114 @@ +# Git hotspots report + +- **Window:** last 30 days +- **Generated:** 2026-04-23T23:03:31Z +- **Top:** 25 files by touch count +- **Excluded prefixes:** docs/hygiene-history/ openspec/changes/ references/upstreams/ + +## Ranking + +| file | touches | unique authors | PR count | +|---|---:|---:|---:| +| docs/BACKLOG.md | 34 | 1 | 26 | +| docs/ROUND-HISTORY.md | 18 | 1 | 12 | +| docs/VISION.md | 14 | 1 | 3 | +| docs/CURRENT-ROUND.md | 13 | 1 | 5 | +| docs/WINS.md | 11 | 1 | 7 | +| memory/MEMORY.md | 10 | 1 | 10 | +| docs/DEBT.md | 10 | 1 | 6 | +| .github/workflows/gate.yml | 9 | 2 | 6 | +| docs/security/THREAT-MODEL.md | 8 | 1 | 5 | +| .gitignore | 8 | 1 | 6 | +| .claude/skills/round-management/SKILL.md | 8 | 1 | 5 | +| GOVERNANCE.md | 7 | 1 | 5 | +| docs/WONT-DO.md | 7 | 1 | 5 | +| docs/TECH-RADAR.md | 7 | 1 | 5 | +| docs/GLOSSARY.md | 7 | 1 | 5 | +| docs/FACTORY-HYGIENE.md | 7 | 1 | 10 | +| AGENTS.md | 7 | 1 | 6 | +| .claude/skills/security-researcher/SKILL.md | 7 | 1 | 4 | +| memory/persona/best-practices-scratch.md | 6 | 1 | 6 | +| docs/research/proof-tool-coverage.md | 6 | 1 | 4 | +| .claude/skills/skill-improver/SKILL.md | 6 | 1 | 3 | +| .claude/skills/skill-creator/SKILL.md | 6 | 1 | 4 | +| .claude/skills/prompt-protector/SKILL.md | 6 | 1 | 4 | +| .claude/skills/backlog-scrum-master/SKILL.md | 6 | 1 | 4 | +| .claude/skills/algebra-owner/SKILL.md | 6 | 1 | 4 | + +## Suggested actions + +Detection-first. The action below is a prompt for human +or Architect judgment, not an enforcement. + +- **split** — file has become a shared bottleneck; consider + per-swim-lane / per-subsystem decomposition +- **freeze** — historical content is append-only; freeze + older rows to an archive and keep recent rows hot +- **audit** — hotness may reflect real work; investigate + whether churn is healthy or pathological +- **watch** — hot but not yet a problem; leave for next + audit cadence + +## What this report is NOT + +- Not an enforcement. The audit exits 0 regardless of + findings. +- Not a blame tool. Author counts are descriptive of + collaboration shape, not performance. +- Not a complete merge-conflict predictor. Two PRs can + conflict on a rarely-touched file; conversely, a + very hot file with careful coordination (append-only + rows) may see zero conflicts. + +## Otto observations (first-run baseline — 2026-04-23) + +This is the first run of the hotspot audit. The ranking +validates the human maintainer's Otto-54 intuition that +`docs/BACKLOG.md` is the factory's top friction surface +(34 touches / 26 PRs in a 30-day window — effectively one +BACKLOG touch per PR opened). + +### Per-file suggested action + +| file | action | rationale | +|---|---|---| +| `docs/BACKLOG.md` | **split** | Matches the Otto-54 BACKLOG-per-swim-lane row. 26 PRs in 30 days touching one file is the paradigmatic serialization bottleneck. | +| `docs/ROUND-HISTORY.md` | **freeze-then-watch** | Historical narrative by design; candidate for "freeze older rounds to archive" pattern per GOVERNANCE.md §2. | +| `docs/VISION.md` | **audit** | 14 touches but only 3 PRs — high commit-density per PR is unusual; likely legitimate iteration during pre-v1 scope shaping, not pathological. | +| `docs/CURRENT-ROUND.md` | **watch** | Per-round update is normal; current touches match cadence. | +| `docs/WINS.md` | **watch** | Append-only; touches track round cadence. | +| `memory/MEMORY.md` | **cadence** | Matches the Otto-54 CURRENT-maintainer-freshness row. 10 touches / 10 PRs = one index update per absorb. Directly addressed by the freshness audit row already backlogged. | +| `docs/DEBT.md` | **watch** | Per-round update; normal cadence. | +| `.github/workflows/gate.yml` | **audit** | 2 unique authors suggests this is where CI changes get proposed by contributors beyond Otto — the only entry with >1 author. Healthy signal, not a split candidate. | +| `docs/security/THREAT-MODEL.md` | **watch** | Security scaffolding is still maturing. | +| `.gitignore` | **watch** | Routine updates as tools + artifacts accumulate. | +| `.claude/skills/round-management/SKILL.md` | **audit** | High touch for a skill file; candidate for skill-tune-up review. | +| `GOVERNANCE.md` | **watch** | Governance rule additions; append-with-context is correct. | +| `docs/WONT-DO.md` | **watch** | Declined-features log grows monotonically; expected. | +| `docs/TECH-RADAR.md` | **watch** | Quarterly radar; touches track band graduations. | +| `docs/GLOSSARY.md` | **watch** | Vocabulary expansion with each new research arc. | +| `docs/FACTORY-HYGIENE.md` | **watch** | Meta-hygiene file; self-reference is OK. This very audit adds one row. | +| `AGENTS.md` | **watch** | Universal onboarding handbook; occasional updates. | +| `.claude/skills/security-researcher/SKILL.md` | **audit** | High touch for a single skill; candidate for skill-tune-up. | +| `memory/persona/best-practices-scratch.md` | **watch** | Scratchpad by design. | +| `.claude/skills/backlog-scrum-master/SKILL.md` | **audit** | Skill touches suggest tune-up cycle underway. | +| `.claude/skills/algebra-owner/SKILL.md` | **audit** | Same as above. | + +### Synthesis + +- **1 split candidate** (`BACKLOG.md`) — the Otto-54 row exists; this run confirms the row is load-bearing. +- **1 freeze-then-watch candidate** (`ROUND-HISTORY.md`) — existing append-only discipline is doing its job; no immediate action. +- **1 cadence candidate** (`memory/MEMORY.md`) — the Otto-54 CURRENT-freshness row is the right remediation. +- **5 audit candidates** (VISION, gate.yml, 4 skill files) — surface these to Kenji / Aarav for skill-tune-up review. +- **11 watch candidates** — normal churn; next audit cadence decides. + +### What the first run reveals about "git-native frictionless" + +The ranking shows the factory has exactly **one file** causing +most of its routine merge friction (`BACKLOG.md` with 26 PRs in +30 days). Splitting that file addresses the bulk of the problem +Aaron named. The rest of the top-20 is either append-only-by- +design (WINS, ROUND-HISTORY, DEBT), well-structured-update +surfaces (governance, glossary, threat model), or skill files +in active tune-up. **Shipping the BACKLOG split is the highest- +leverage move available under Aaron's Otto-54 directive.** diff --git a/tools/hygiene/audit-git-hotspots.sh b/tools/hygiene/audit-git-hotspots.sh new file mode 100755 index 00000000..f2bb1a0f --- /dev/null +++ b/tools/hygiene/audit-git-hotspots.sh @@ -0,0 +1,250 @@ +#!/usr/bin/env bash +# tools/hygiene/audit-git-hotspots.sh +# +# Identifies high-churn files in the repo over a configurable +# window — the "hotspots" the human maintainer named on +# 2026-04-23 Otto-54: +# +# > cadence for checking github hotspots too this is a hygene +# > issues points of friction and bottlenecks, we are +# > frictionless... git hotspots i mean... we are gitnative +# > with github as our first host +# +# High-churn shared files are the paradigmatic friction surface +# (routine merge conflicts, reviewer burden, serialization +# bottleneck). The audit surfaces candidates; the action +# (split / freeze / archive / watch) is a judgment call the +# author or architect makes from the report. +# +# Part of the Otto-54 directive cluster in BACKLOG.md § +# "P1 — Git-native hygiene cadences". Composes with: +# (The verbatim quote above is preserved as attribution — +# the quoted directive IS attribution, which is the narrow +# name-attribution exemption. Outside the quote block this +# prose uses role references per the no-name-attribution +# rule.) +# - BACKLOG-per-swim-lane split row (one remediation option) +# - CURRENT-maintainer freshness audit row (one remediation +# option for memory/MEMORY.md hotspots) +# +# Usage: +# tools/hygiene/audit-git-hotspots.sh # default window: 60 days, top 20 +# tools/hygiene/audit-git-hotspots.sh --window 30d # custom window +# tools/hygiene/audit-git-hotspots.sh --top 40 # show more rows +# tools/hygiene/audit-git-hotspots.sh --report PATH # write markdown report +# +# Exit codes: +# 0 — always (detect-only, no enforcement yet; see Otto-54 +# NOT-list: detection-first, action-second) + +set -euo pipefail + +window="60 days" +top=20 +report="" + +require_value() { + # require_value FLAG VALUE — aborts with a clear message if VALUE is empty. + if [[ -z "${2:-}" ]]; then + echo "error: $1 requires a value" >&2 + exit 64 + fi +} + +require_positive_int() { + # require_positive_int FLAG VALUE — aborts with exit 64 if VALUE is not a positive integer. + if ! [[ "${2:-}" =~ ^[1-9][0-9]*$ ]]; then + echo "error: $1 requires a positive integer, got: ${2:-}" >&2 + exit 64 + fi +} + +while [[ $# -gt 0 ]]; do + case "$1" in + --window) + require_value "$1" "${2:-}" + window="$2" + shift 2 + ;; + --top) + require_value "$1" "${2:-}" + require_positive_int "$1" "${2:-}" + top="$2" + shift 2 + ;; + --report) + require_value "$1" "${2:-}" + report="$2" + shift 2 + ;; + -h|--help) + # Skip the shebang line so --help output doesn't start with + # `!/usr/bin/env bash`. The sed rewrite strips the leading + # `# ` / `#` markers so the doc block reads as plain prose. + grep '^#' "$0" | grep -v '^#!' | sed 's/^# //;s/^#//' + exit 0 + ;; + *) + echo "unknown arg: $1" >&2 + exit 64 + ;; + esac +done + +# Count per-file touches in the window, excluding paths we +# deliberately expect to be hot: +# - docs/hygiene-history/**: append-only fire logs; churn is +# by design (one row per tick). +# - openspec/changes/**: OpenSpec staging surface (by design +# high-churn during spec backfill). +# - references/upstreams/**: vendored external repos; not +# ours to audit. +excluded_prefixes=( + 'docs/hygiene-history/' + 'openspec/changes/' + 'references/upstreams/' +) + +# Guard: the audit must run inside a git worktree. Without this +# check a `git log` failure (missing worktree, corrupt repo, +# unreadable objects) would be masked by `|| true` downstream +# and produce a misleading "no commits" report while exiting 0. +if ! git rev-parse --is-inside-work-tree >/dev/null 2>&1; then + echo "error: tools/hygiene/audit-git-hotspots.sh must run inside a git worktree" >&2 + exit 128 +fi + +# Count touches: one row per (commit, file) pair. Note that +# `git log --name-only` also lists files touched by deletion +# commits (the path appears even though the file no longer +# exists at HEAD). That's correct for a hotspot report — +# frequent deletion of a path is still friction — so we +# deliberately include deletions in the count rather than +# filter them out. +# +# `sed '/^$/d'` (rather than `grep -v '^$' || true`) is used so +# the empty-output case is handled by sed returning exit 0 with +# an empty string, and any real `git log` failure propagates via +# `set -euo pipefail` instead of being masked by `|| true`. +raw=$(git log --since="$window" --pretty=format: --name-only \ + | sed '/^$/d') + +# If the window is empty (new repo, tight window), bail +# gracefully rather than aborting under `set -euo pipefail`. +if [[ -z "$raw" ]]; then + echo "no commits in window '$window' (or all filtered)" >&2 +fi + +# Apply exclusions. +filtered="$raw" +for prefix in "${excluded_prefixes[@]}"; do + filtered=$(printf '%s\n' "$filtered" | grep -v "^$prefix" || true) +done + +# Tally by file. +ranked=$(printf '%s\n' "$filtered" | sort | uniq -c | sort -rn) + +# Unique author / PR-count per file — best-effort (may undercount +# in squash-merge workflow where PR number appears in the +# commit subject rather than the file touch). +file_summary() { + local file="$1" + local touches="$2" + local authors_raw pr_raw authors pr_count + # Let `git log` failures propagate — don't mask with `|| true` + # or redirect stderr to /dev/null, both of which silently turn + # partial-clone / missing-object errors into fabricated zeros. + # The empty-match case (file not in window, or no PR tokens in + # subjects) is handled by counting lines directly: `grep -c` + # would exit 1 on no matches and trip pipefail, so we pipe + # through `wc -l` which always exits 0. + # + # PR-count parses trailing `(#NNN)` squash-merge markers only. + # Bare `#NNN` tokens in subjects (e.g. "row #58", "fix #213") + # are intentionally not counted — they are row IDs / issue + # refs, not PR numbers, and counting them inflates the metric. + authors_raw=$(git log --since="$window" --pretty=format:'%an' -- "$file") + if [[ -z "$authors_raw" ]]; then + authors=0 + else + authors=$(printf '%s\n' "$authors_raw" | sort -u | wc -l | tr -d ' ') + fi + # Capture subjects first (propagates git log failures under + # pipefail), then run the grep filter in a context where a + # no-match result (exit 1) is fine. + local subjects + subjects=$(git log --since="$window" --pretty=format:'%s' -- "$file") + if [[ -z "$subjects" ]]; then + pr_count=0 + else + pr_raw=$(printf '%s\n' "$subjects" | grep -oE '\(#[0-9]+\)$' | sort -u || true) + if [[ -z "$pr_raw" ]]; then + pr_count=0 + else + pr_count=$(printf '%s\n' "$pr_raw" | wc -l | tr -d ' ') + fi + fi + printf '| %s | %s | %s | %s |\n' "$file" "$touches" "$authors" "$pr_count" +} + +render() { + printf '# Git hotspots report\n\n' + printf -- '- **Window:** last %s\n' "$window" + printf -- '- **Generated:** %s\n' "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" + printf -- '- **Top:** %s files by touch count\n' "$top" + printf -- '- **Excluded prefixes:** %s\n\n' "${excluded_prefixes[*]}" + + printf '## Ranking\n\n' + printf '| file | touches | unique authors | PR count |\n' + printf '|---|---:|---:|---:|\n' + + # Stream the top-N rows without a `head` pipeline. Piping + # `printf` into `head -n N` under `set -euo pipefail` can + # surface as SIGPIPE 141 when `head` closes early on a long + # ranked list, which would violate the "always exit 0" + # contract. Iterate + counter instead. + local count=0 + while IFS= read -r line; do + [[ -z "$line" ]] && continue + (( count >= top )) && break + # Extract touch count (first whitespace-delimited field from + # `uniq -c` output) without disturbing the rest of the row. + # `awk '{$1=""; print}'` normalises internal whitespace — + # that would corrupt filenames containing multiple spaces + # or tabs. Use a regex that strips exactly the `uniq -c` + # prefix (leading spaces + count + single space). + touches=$(printf '%s' "$line" | awk '{print $1}') + file=$(printf '%s' "$line" | sed -E 's/^[[:space:]]*[0-9]+[[:space:]]//') + [[ -z "$file" ]] && continue + file_summary "$file" "$touches" + count=$((count + 1)) + done <<<"$ranked" + + printf '\n## Suggested actions\n\n' + printf 'Detection-first. The action below is a prompt for human\n' + printf 'or Architect judgment, not an enforcement.\n\n' + printf -- '- **split** — file has become a shared bottleneck; consider\n' + printf ' per-swim-lane / per-subsystem decomposition\n' + printf -- '- **freeze** — historical content is append-only; freeze\n' + printf ' older rows to an archive and keep recent rows hot\n' + printf -- '- **audit** — hotness may reflect real work; investigate\n' + printf ' whether churn is healthy or pathological\n' + printf -- '- **watch** — hot but not yet a problem; leave for next\n' + printf ' audit cadence\n\n' + printf '## What this report is NOT\n\n' + printf -- '- Not an enforcement. The audit exits 0 regardless of\n' + printf ' findings.\n' + printf -- '- Not a blame tool. Author counts are descriptive of\n' + printf ' collaboration shape, not performance.\n' + printf -- '- Not a complete merge-conflict predictor. Two PRs can\n' + printf ' conflict on a rarely-touched file; conversely, a\n' + printf ' very hot file with careful coordination (append-only\n' + printf ' rows) may see zero conflicts.\n' +} + +if [[ -n "$report" ]]; then + render > "$report" + echo "Report written: $report" >&2 +else + render +fi