Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions .github/workflows/memory-index-duplicate-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: memory-index-duplicate-lint

# Detects duplicate link targets in `memory/MEMORY.md` —
# Amara 2026-04-23 decision-proxy + technical review action
# item #2 (PR #219 absorb). An index with duplicate entries
# is a discoverability defect: fresh sessions can't tell
# which entry is authoritative; the newest-first ordering
# invariant breaks when the same file appears twice.
#
# Companion to `.github/workflows/memory-index-integrity.yml`
# (the same-commit-pairing check for memory/ changes +
# MEMORY.md updates). That check ensures index edits happen;
# this check ensures those edits don't create duplicates.
Comment thread
AceHack marked this conversation as resolved.
#
# Safe-pattern compliance (FACTORY-HYGIENE row #43):
# - SHA-pinned actions/checkout
# - Explicit minimum `permissions: contents: read`
# - No user-authored context referenced
# - Concurrency group + cancel-in-progress: false
# - runs-on: ubuntu-22.04 pinned
#
# See:
# - tools/hygiene/audit-memory-index-duplicates.sh (the tool)
# - docs/aurora/2026-04-23-amara-decision-proxy-technical-
# review.md (ferry with the proposal)
Comment thread
AceHack marked this conversation as resolved.

on:
pull_request:
paths:
- "memory/MEMORY.md"
- "tools/hygiene/audit-memory-index-duplicates.sh"
- ".github/workflows/memory-index-duplicate-lint.yml"
push:
branches: [main]
paths:
- "memory/MEMORY.md"
- "tools/hygiene/audit-memory-index-duplicates.sh"
- ".github/workflows/memory-index-duplicate-lint.yml"
workflow_dispatch: {}

permissions:
contents: read

concurrency:
group: memory-index-duplicate-lint-${{ github.ref }}
cancel-in-progress: false

jobs:
lint:
name: lint memory/MEMORY.md for duplicate link targets
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: run duplicate-link lint
shell: bash
run: |
set -euo pipefail
tools/hygiene/audit-memory-index-duplicates.sh --enforce
1 change: 1 addition & 0 deletions docs/FACTORY-HYGIENE.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ is never destructive; retiring one requires an ADR in
| 56 | MD032 plus-at-line-start preflight audit (detects prose-continuation `+` followed by space that markdownlint misparses as list items) | Detect-only (landed 2026-04-24); on-touch when author edits markdown; round-cadence sweep + `--enforce` flip when baseline is green. | Dejan (devops-engineer) on cadenced + enforce-transition; author of markdown change self-administered on-touch. | factory | `tools/hygiene/audit-md032-plus-linestart.sh` scans tracked `.md` files for CommonMark-style plus-then-space list-marker lines (regex `^ {0,3}\+` followed by a single space: up to 3 leading spaces allowed, then `+`, then space) where the previous line is non-blank AND is not itself a plus-then-space marker line (so contiguous plus-space lists are not flagged). Whitespace-normalisation on the predecessor-blank check strips all whitespace classes (spaces, tabs, CR) via `[[:space:]]`, so tab-only separator lines count as blank. Path iteration uses NUL-delimited `git ls-files -z` piped into a `while read -d ''` loop and the script runs `cd` to `git rev-parse --show-toplevel` first, so paths resolve from repo root regardless of working directory. Excludes `docs/ROUND-HISTORY.md`, `docs/hygiene-history/**`, `docs/DECISIONS/**`, and self. The `--list` flag prints offending `file:lineno`; `--enforce` flips exit 2 on gap. **Why this row exists:** Otto-session 2026-04-23 hit MD032 regressions three times (Otto-35 + Otto-38 + Otto-38-again). The pattern is author-friendly in intent (prose continuation using `+`) but markdownlint-hostile (parsed as list item). Author-time detection prevents the full CI round-trip. Baseline at first fire (2026-04-24, post review-drain revision on PR #204) was ~170 gaps at repo scope — the CommonMark-aware rewrite removed the earlier file-level-skip heuristic (which masked false negatives when a file used `+` as its bullet style but still contained a prose-continuation `+`) in favour of per-line contiguous-list detection. **Classification (row #47):** **prevention-bearing** — audit runs at author-time (on-touch) and surfaces gap before commit. Ships to project-under-construction: adopters inherit audit + pattern + exclusion discipline. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/md032-plus-linestart-audit-history.md` (per-fire schema per row #44); author-time gap lands as fix-at-source (opportunistic). | `tools/hygiene/audit-md032-plus-linestart.sh` + this row's self-reference |
| 61 | Surface-map-drift smell (wrong URL on a mapped surface fires a hygiene alarm) | Pre-call: every `gh api <path>` (or equivalent platform call) on a surface that has a mapping doc — grep the map first, use its path, otherwise record a map-gap. Post-call: every 410 / 301 / "endpoint moved" response on a mapped endpoint auto-proposes a map-update. Cadenced sweep every 5-10 rounds replays the full set of mapped endpoints against the current platform to catch silent drift (endpoint renamed without 410). | Any agent calling `gh api` (self-administered on pre-call / post-call); Dejan (devops-engineer) on the cadenced sweep; Kenji (Architect) on map-update PRs when drift lands. Bounded to surfaces with a mapping doc under `docs/research/*surface-map*.md` / `docs/AGENT-*-SURFACES.md` / `docs/HARNESS-SURFACES.md` / `docs/GITHUB-SETTINGS.md`. | factory | **Pre-call (prevention-bearing):** before invoking any `gh api` call against org / enterprise / Copilot / billing / settings surfaces, `grep -li "<surface-keyword>" <mapping-docs>` and use the path the map lists. If the map lacks the path, **file a map-gap finding** in the same audit's output — agent may still call a best-guess endpoint if confident the surface exists, but must log the gap so the next round-close sweep extends the map. **Post-call (detection-bearing):** any `410 Gone` / `301 Moved Permanently` / `"endpoint moved"` response from a mapped endpoint triggers a map-update task (write the new path to the map; note old-path + redirect-doc + drift-date in a "Map drift log" section). **Cadenced (detection-bearing):** every 5-10 rounds, replay the full set of mapped endpoints against the current platform to catch silent renames (200 OK from a stale path that silently redirects, or 404 from an endpoint removed without deprecation). **Why this row exists:** Aaron 2026-04-22 after agent invented `/orgs/.../billing/budgets` (404) for LFG budget audit despite task #195 having already produced the complete map: *"i'm supprised you got the url wrong given you mapped it"* + *"that should be a smell when that happen to a surface you already have mapped"*. Same incident revealed a second drift class — `/orgs/{org}/settings/billing/actions` (map §A.17) returned 410 with `documentation_url: https://gh.io/billing-api-updates-org`, meaning GitHub moved the endpoint between 2026-04-22 (map author-time) and 2026-04-22 (this fire, hours later). Two orthogonal failure modes compound: (a) **not-consulting** an existing map (guess without grep), (b) **consulting-but-stale** map (correct path + platform drift). **UI-only surfaces** (e.g., GitHub org budget management at `https://github.com/organizations/{org}/billing/budgets`, no REST equivalent) are legitimate map entries — the map should mark them as `ui-only` so agents know "no API path exists" before trying. **Classification (row #47):** **prevention-bearing** — the pre-call grep discipline is the prevention layer; the post-call 410 handler is a complementary detection layer; the cadenced sweep is the insurance detection layer for silent renames. See `memory/feedback_surface_map_consultation_before_guessing_urls.md`. Ships to project-under-construction: adopters inherit the smell pattern + the pre-call grep obligation + the map-update-on-410 trigger. | Pre-call: grep output shown in the audit (map-hit / map-miss). Post-call: map-update PR when 410/301 lands, with "Map drift log" row recording old-path + redirect-doc + drift-date. Cadenced: sweep output logged to `docs/hygiene-history/surface-map-drift-history.md` (per-fire schema per row #44). ROUND-HISTORY row when a drift resolves. | `memory/feedback_surface_map_consultation_before_guessing_urls.md` (authoritative) + `docs/research/github-surface-map-complete-2026-04-22.md` (primary target for GitHub surfaces) + `docs/AGENT-GITHUB-SURFACES.md` (ten-surface playbook) + `docs/HARNESS-SURFACES.md` + `docs/GITHUB-SETTINGS.md` + this row's enforcement discipline (agent-self-administered pre-call, detection scripts TBD under `tools/hygiene/audit-surface-map-drift.sh`) |
| 62 | Skill data/behaviour split audit (skills stay routine-only; catalogs / inventories / adapter tables / worked examples offload to `docs/**.md`; event logs to `docs/hygiene-history/**.md`) | Author-time (prevention-bearing, every new or touched `SKILL.md`) via the `skill-creator` workflow's authoring checklist + cadenced detection every 5-10 rounds (same cadence as row #5 skill-tune-up) over `.claude/skills/**/SKILL.md` for mix signatures (gotcha-list > 3 items, worked-example / case-study > 20 lines, adapter / compatibility table, inventory matrix, cross-platform neutrality matrix) + opportunistic on-touch at every `SKILL.md` edit. | `skill-creator` workflow on author-time (self-check against the checklist); Aarav (skill-tune-up) on cadenced detection; all agents (self-administered) on on-touch edits. Retrospective one-shot pass over the existing roster queued in BACKLOG P1. | both | **Principle:** a skill's SKILL.md is the **behaviour layer** (the routine / procedure / decision-flow the agent walks through at invocation time). Catalogs of gotchas, inventories of what-survives / what-breaks, adapter-neutrality tables, enumerated variants, and worked-example galleries are **data**, not behaviour — they belong in `docs/<CAPITALIZED-NAME>.md`. Event logs (append-only history of each fire) belong in `docs/hygiene-history/<name>-history.md` per FACTORY-HYGIENE row #44. **Why the split matters:** (a) a routine edits differently than a catalog — the routine changes rarely, catalogs accrete continuously; bundling them creates churn the skill-diff can't cleanly attribute. (b) An agent invoking a skill needs the routine cold-loaded into context; the catalog is consultation-on-demand. Bundling inflates every invocation's token cost with data the routine doesn't always need. (c) Data is queryable under `docs/` (grep-friendly, indexable, linkable from other surfaces); under `.claude/skills/` it is invocation-local and harder to cite. **Mix signatures (trigger the audit):** a SKILL.md with ≥ 2 of — (a) "Known gotchas" section > 3 items; (b) "Worked example" / "Case study" / "In practice" section > 20 lines; (c) adapter / compatibility / variants / neutrality table; (d) what-survives / what-breaks inventory table; (e) cross-platform matrix; (f) multi-row catalog of any sort inside the SKILL.md body. **Split target:** routine stays, data moves to `docs/<CAPITALIZED-NAME>.md`, events to `docs/hygiene-history/<name>-history.md`, and the SKILL.md body carries pointers to the new data surface under a "Data surface" section. **Triggering incident:** 2026-04-22 first-pass `github-repo-transfer` SKILL.md mixed routine + S1-S7 gotcha catalog + adapter table + worked example; Aaron caught it — *"you told me you wanted to split skills into data and behavior/routines, see i remember what you tell me too"* (invoking the agent's own prior principle from `memory/feedback_text_indexing_for_factory_qol_research_gated.md`: *"seperating thing by data and behiaver is a tried and true way and you mentied it for the skills earler, works in code too lol"*). Canonical worked example after split: `.claude/skills/github-repo-transfer/SKILL.md` + `docs/GITHUB-REPO-TRANSFER.md` + `docs/hygiene-history/repo-transfer-history.md`. **Classification (row #47):** **prevention-bearing** — the `skill-creator` authoring checklist asks the split question at author-time; cadenced detection is the backup layer for skills landed before this row existed. Ships to project-under-construction: adopters inherit the three-surface pattern (behaviour / data / fire-log) + the authoring checklist + the cadenced audit. | Audit output per cadenced fire listing every `SKILL.md` + its mix-signature score + a split-or-justify recommendation, logged to `docs/hygiene-history/skill-data-behaviour-split-history.md` (per-fire schema per row #44); ROUND-HISTORY row when a skill splits; BACKLOG row if the retrospective surfaces > 3 existing mixes; `skill-edit-justification-log.md` entry when a mix is deliberate (rare; requires a stated reason). | `memory/feedback_skills_split_data_behaviour_factory_rule.md` (authoritative — to be written this tick) + `memory/feedback_text_indexing_for_factory_qol_research_gated.md` (Aaron's original principle statement) + `.claude/skills/github-repo-transfer/SKILL.md` + `docs/GITHUB-REPO-TRANSFER.md` + `docs/hygiene-history/repo-transfer-history.md` (three-surface canonical worked example) + `.claude/skills/skill-creator/SKILL.md` (authoring workflow — carries the checklist) + `.claude/skills/skill-tune-up/SKILL.md` (detection runner — gains a mix-signature check on top of its existing drift / contradiction / staleness / user-pain / bloat / BP-drift / portability-drift criteria) |
| 63 | Memory-index duplicate-link lint (`memory/MEMORY.md` flagged if the same `.md` target appears more than once in the newest-first index) | Every pull_request + push-to-main touching `memory/MEMORY.md` or the audit tool + workflow; workflow_dispatch manual run available. Detect-only (exit 2 on duplicate) per `--enforce` flag in CI invocation. | Automated via `.github/workflows/memory-index-duplicate-lint.yml`; human-maintainer or any contributor resolves on fail. | factory | `tools/hygiene/audit-memory-index-duplicates.sh` greps for link targets matching `](foo.md)` in the supplied file (default `memory/MEMORY.md`) and tallies by target; any count > 1 fails. Catches: exact duplicate entries + old-plus-new pointer after an edit that forgot to dedupe. Does NOT catch: substantially similar descriptions of different files (judgment-based). **Why this row exists:** Amara 2026-04-23 decision-proxy + technical review (PR #219 absorb) action item #2 — her observation that `memory/MEMORY.md` had duplicate entries in an older state ("Signal-in, signal-out" + "Deletions > insertions" appearing twice each). Per-user MEMORY.md currently has 1 duplicate (`project_learning_repo_khan_style_...` appears twice) confirming the class. In-repo MEMORY.md currently clean. **Classification (row #47):** **prevention-bearing** — CI blocks merge before the duplicate lands. Ships to project-under-construction: adopters inherit the workflow unchanged; the `memory/MEMORY.md` convention is factory-generic. Sibling to row #58 (memory-index-integrity — same-commit pairing of memory edit + MEMORY.md edit). | CI job result + annotated fail message in PR checks. Optional fire-history surface if long-term retention beyond 90-day CI log is desired. | `.github/workflows/memory-index-duplicate-lint.yml` (CI invocation) + `tools/hygiene/audit-memory-index-duplicates.sh` (the detection tool) + `docs/aurora/2026-04-23-amara-decision-proxy-technical-review.md` (Amara ferry with proposal) + row #58 sibling (memory-index-integrity) |

## Ships to project-under-construction

Expand Down
Loading
Loading