Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -920,6 +920,45 @@ within each priority tier.
the invitation for any 3rd member is sent. Effort: S (minutes
after sign-off). Reviewer: Aminata (threat-model-critic).

- [ ] **Map-completeness audit — proactive "do our maps cover
the surfaces we touch?" cadence** (round 44 surface-map-drift
absorb) — Aaron 2026-04-22: *"missing map hygene on backlog?"*
after agent tripped on a surface (GitHub org spending-budget)
that was never in the map. FACTORY-HYGIENE row #50 covers the
*reactive* smell (wrong URL on a mapped surface); this row
covers the complementary *proactive* audit: enumerate every
surface the factory actually touches — from `gh api` calls in
`tools/**`, `.github/workflows/**`, `.claude/skills/**`, and
docs — cross-reference against each mapping doc
(`docs/HARNESS-SURFACES.md`,
`docs/research/github-surface-map-complete-2026-04-22.md`,
`docs/AGENT-GITHUB-SURFACES.md`, `docs/GITHUB-SETTINGS.md`),
and flag surfaces-used-but-unmapped. **Known gaps surfaced by
the triggering incident:** (1) GitHub org spending-budget UI
at `https://github.com/organizations/{org}/billing/budgets`
(added to map as `ui-only` row 2026-04-22); (2) Copilot
Business per-feature toggle state
(`public_code_suggestions`, `ide_chat`, `cli`, `platform_chat`
— values documented in
`memory/feedback_lfg_paid_copilot_teams_throttled_experiments_allowed.md`
but not yet mapped as a declarative surface); (3) the
`coding-agent` / `internet-search` / custom-instruction
enablement flags Aaron referenced ("turned all them on") —
each is a distinct UI toggle with no declarative home yet.
**Detection script:** `tools/hygiene/audit-map-completeness.sh`
greps `gh api` usage + URL patterns across the tree,
normalises to `{org-or-repo}/<path>`, diffs against mapping
docs' enumerated endpoints, outputs surface-used-but-unmapped
list. **Cadence:** every 5-10 rounds (same cadence as
skill-tune-up, row #46, row #48) so the map isn't a write-once
artifact that rots. **Companion:** FACTORY-HYGIENE row for
map-completeness (a new row #51 once this lands). **Effort:**
S (detection script + first sweep) + S per gap closure.
**Reviewer:** Dejan (devops-engineer) for the detection
mechanics; Architect (Kenji) for map-extension decisions
(which gaps land where). Related: `memory/feedback_surface_map_consultation_before_guessing_urls.md`;
FACTORY-HYGIENE row #50.

- [ ] **Orthogonal-axes cadenced audit — make the factory's
axis set an orthogonal basis (round 44 absorb)** — Aaron
2026-04-22: *"also we need to make sure all our axises are
Expand Down
1 change: 1 addition & 0 deletions docs/FACTORY-HYGIENE.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ is never destructive; retiring one requires an ADR in
| 46 | Post-setup script stack audit (bun+TS default; bash only under exempt paths or with exception label) | Author-time (every new `tools/**/*.{sh,ps1}` decision-flow walk per `docs/POST-SETUP-SCRIPT-STACK.md`) + cadenced detection every 5-10 rounds (same cadence as skill-tune-up / row #38 / harness-surface audit) + opportunistic on-touch (every time an agent adds or edits a script under `tools/`). | Author of the script (self-check at author-time against the decision-flow doc); Dejan (devops-engineer) on the cadenced detection sweep; Kenji (Architect) on migration-order decisions when multiple violations stack up. | both | **Author-time prevention:** walk the three-question flow in `docs/POST-SETUP-SCRIPT-STACK.md` before writing any new `tools/**/*.{sh,ps1}` — (Q1) pre-setup? → `tools/setup/` bash+PowerShell exempt; (Q2) skill-bundled? → skill-compatibility rules govern, not this row; (Q3) default bun+TypeScript unless an explicit exception (trivial pipeline / thin CLI wrapper / bash scaffolding / sibling-migration guardrail) applies, in which case the script MUST carry a header comment naming the exception. **Cadenced detection:** `tools/hygiene/audit-post-setup-script-stack.sh` lists every shell/PowerShell script under `tools/` and classifies each as `exempt` / `labelled-exception` / `violation`. Exit-2 on any new violation; CI / pre-commit-eligible. **Why both layers:** prevention catches new violations at author-time (cheap); detection catches drift — labels getting stripped on edits, exceptions becoming stale, scripts moving out of exempt paths. Ships to project-under-construction: adopters inherit the canonical-stack rule + the audit script + the decision-flow doc. Aaron 2026-04-22 triggering-directive-chain: *"if post setup backlog bun/ts"* → *"now add someting that will try to prevent that and and hygene it if it happens again"*. | Author-time: commit-message rationale for any new `.sh` under `tools/` outside `tools/setup/`, OR exception-label header in the script, OR BACKLOG row queuing bun+TS migration. Cadenced: audit script output (markdown), appended to `docs/hygiene-history/post-setup-script-stack-history.md` (per-fire schema per row #44); BACKLOG row per unlabeled violation. | `docs/POST-SETUP-SCRIPT-STACK.md` (prevention surface) + `tools/hygiene/audit-post-setup-script-stack.sh` (detection surface) + `memory/project_ui_canonical_reference_bun_ts_backend_cutting_edge_asymmetry` + `memory/project_bun_ts_post_setup_low_confidence_watchlist` |
| 47 | Missing-prevention-layer meta-audit (every hygiene row carries a prevention classification: prevention-bearing / detection-only-justified / detection-only-gap) | Round cadence (same as rows #22 / #23 / #35 / #36) + opportunistic on-touch (every time a new row is added to `docs/FACTORY-HYGIENE.md` the author classifies it at landing). Not exhaustive; the round-close sweep catches un-classified rows and gap rows. | Architect (Kenji) on round-cadence classification review + gap-closure ROI assessment. All agents (self-administered) on on-touch: every new hygiene row MUST declare its prevention classification at landing; an unclassified row is itself a violation of this row. | factory | Sweep every row in `docs/FACTORY-HYGIENE.md` and classify each as one of: (a) **prevention-bearing** — an author-time / commit-time / trigger-time mechanism (hook, CI check, decision-flow doc, pre-commit lint, skill-gate) blocks or warns the violation BEFORE it materialises; (b) **detection-only-justified** — the class is fundamentally post-hoc (e.g., cadence-history row #44 — a fire-log can only exist AFTER the fire happens; wake-friction row #29 — friction is only observable at wake-time); (c) **detection-only-gap** — no principled reason the row is detection-only; a prevention layer COULD and SHOULD be built. Classification lives in `docs/hygiene-history/prevention-layer-classification.md` (one table row per hygiene row). **Why this row exists:** Aaron 2026-04-22 *"add a hygene for missing prevention layers"* — the factory had been quietly accumulating detection-only rows without asking the complementary question "could we have prevented this at author-time?". Without this meta-audit, the factory's reactive-cost grows silently. Parallels the existing meta-hygiene triangle (row #23 unknown-classes / #43 authored-but-unactivated / #44 cadence-history) by adding a fourth: row #47 is *"of the rows that ARE active and firing, which could have been prevented upstream"*. **Classification:** this is an **intentionality-enforcement** hygiene rule (Aaron 2026-04-22 tick-close: *"we are enforcing intentional decsions"*) — the audit cannot compute whether a row's classification is correct, but it forces every new hygiene row to carry an explicit prevention-vs-detection decision at landing. Declining to classify is itself the violation. See `memory/feedback_enforcing_intentional_decisions_not_correctness.md`. Ships to project-under-construction: adopters inherit the classification discipline + the meta-audit script + the obligation to classify any new hygiene row at landing. | `docs/hygiene-history/prevention-layer-classification.md` (classification matrix, one row per hygiene row) + cadenced audit run landed as `docs/hygiene-history/missing-prevention-layer-audit-YYYY-MM-DD.md` noting gap rows; ROUND-HISTORY row when a gap row gains a prevention layer (detection-only-gap → prevention-bearing transition); BACKLOG row per gap with prevention-design ROI estimate. | `tools/hygiene/audit-missing-prevention-layers.sh` + this row's self-reference (its own prevention layer is the at-landing-classify obligation declared in this Checks/enforces column) |
| 49 | Tick-history bounded-growth audit (`docs/hygiene-history/loop-tick-history.md` line-count vs threshold) | Detect-only (landed 2026-04-22); cadenced detection once per round-close (same cadence as row #44 cadence-history sweep, since this is the canonical row #44 worked example auditing itself); opportunistic on-touch whenever the tick-history file is read or edited. Archive action itself remains manual for now; deferring automation to the larger BACKLOG row that also covers threshold-revision and append-without-reading refactor. | Dejan (devops-engineer) on cadenced detection; the tick itself (self-administered at tick-close) on the opportunistic on-touch — each tick's end-of-tick sequence can invoke this audit after the append + commit to get a `within bounds: 96/500 lines` visibility signal. | factory | `tools/hygiene/audit-tick-history-bounded-growth.sh` checks the file's line count against a threshold (default 500, overrideable via `--threshold N`) and exits 0 within bounds / 2 over threshold. The threshold is set lower than the stated 5000-line paper bound because the file is read on every tick-close append — a per-tick context cost that scales linearly with file size — and 5000 lines represents too large a context hit on a 1-minute cadence. The audit's header block carries a mini-ADR decision record for the 500-line choice (context / decision / alternatives / supersedes / expires-when). **Why this row exists:** Aaron 2026-04-22 tick-fire interrupt: *"does loop tick history grow unbounded? that's an issue if so you just read it"*. Honest state was stated-bound-no-enforcement: file header named 5000 lines, nothing checked it. This row closes the enforcement gap for the threshold-check half of the full BACKLOG row (archive-action + append-without-reading refactor remain deferred). **Self-referential closure:** the tick-history file IS the canonical row-#44 cadence-history-tracking worked example (named explicitly in row #44's "Durable output" citation). Until this row landed, the most-cadenced surface in the factory — the tick itself — had its fire-log surface unaudited for its own growth. Meta-audit triangle remains intact (existence #23 / activation #43 / fire-history #44), and row #49 adds a fourth: fire-history files themselves need bounded-growth audits because they grow at the cadence of the surface they track. **Classification (row #47):** **prevention-bearing** — the audit surfaces approaching-threshold warnings at 80% so the archive action can be planned, rather than reactive-only at over-threshold. Ships to project-under-construction indirectly: adopters inherit the pattern (fire-log files under their own `docs/hygiene-history/` need the same bounded-growth treatment), not this exact script. | Audit output on each fire; cadenced runs appended to `docs/hygiene-history/tick-history-bounded-growth-history.md` (per-fire schema per row #44); BACKLOG row when archival is due (archive-action itself queued as part of the larger tick-history enforcement BACKLOG row); ROUND-HISTORY row when threshold changes or archive action executes. | `tools/hygiene/audit-tick-history-bounded-growth.sh` (detection + mini-ADR header block) + `docs/hygiene-history/loop-tick-history.md` (target surface, canonical row #44 worked example) + BACKLOG row *"Loop-tick-history bounded-growth enforcement"* (larger follow-up: threshold revision + append-without-reading refactor + archive action) |
| 50 | Surface-map-drift smell (wrong URL on a mapped surface fires a hygiene alarm) | Pre-call: every `gh api <path>` (or equivalent platform call) on a surface that has a mapping doc — grep the map first, use its path, otherwise record a map-gap. Post-call: every 410 / 301 / "endpoint moved" response on a mapped endpoint auto-proposes a map-update. Cadenced sweep every 5-10 rounds replays the full set of mapped endpoints against the current platform to catch silent drift (endpoint renamed without 410). | Any agent calling `gh api` (self-administered on pre-call / post-call); Dejan (devops-engineer) on the cadenced sweep; Kenji (Architect) on map-update PRs when drift lands. Bounded to surfaces with a mapping doc under `docs/research/*surface-map*.md` / `docs/AGENT-*-SURFACES.md` / `docs/HARNESS-SURFACES.md` / `docs/GITHUB-SETTINGS.md`. | factory | **Pre-call (prevention-bearing):** before invoking any `gh api` call against org / enterprise / Copilot / billing / settings surfaces, `grep -li "<surface-keyword>" <mapping-docs>` and use the path the map lists. If the map lacks the path, **file a map-gap finding** in the same audit's output — agent may still call a best-guess endpoint if confident the surface exists, but must log the gap so the next round-close sweep extends the map. **Post-call (detection-bearing):** any `410 Gone` / `301 Moved Permanently` / `"endpoint moved"` response from a mapped endpoint triggers a map-update task (write the new path to the map; note old-path + redirect-doc + drift-date in a "Map drift log" section). **Cadenced (detection-bearing):** every 5-10 rounds, replay the full set of mapped endpoints against the current platform to catch silent renames (200 OK from a stale path that silently redirects, or 404 from an endpoint removed without deprecation). **Why this row exists:** Aaron 2026-04-22 after agent invented `/orgs/.../billing/budgets` (404) for LFG budget audit despite task #195 having already produced the complete map: *"i'm supprised you got the url wrong given you mapped it"* + *"that should be a smell when that happen to a surface you already have mapped"*. Same incident revealed a second drift class — `/orgs/{org}/settings/billing/actions` (map §A.17) returned 410 with `documentation_url: https://gh.io/billing-api-updates-org`, meaning GitHub moved the endpoint between 2026-04-22 (map author-time) and 2026-04-22 (this fire, hours later). Two orthogonal failure modes compound: (a) **not-consulting** an existing map (guess without grep), (b) **consulting-but-stale** map (correct path + platform drift). **UI-only surfaces** (e.g., GitHub org budget management at `https://github.com/organizations/{org}/billing/budgets`, no REST equivalent) are legitimate map entries — the map should mark them as `ui-only` so agents know "no API path exists" before trying. **Classification (row #47):** **prevention-bearing** — the pre-call grep discipline is the prevention layer; the post-call 410 handler is a complementary detection layer; the cadenced sweep is the insurance detection layer for silent renames. See `memory/feedback_surface_map_consultation_before_guessing_urls.md`. Ships to project-under-construction: adopters inherit the smell pattern + the pre-call grep obligation + the map-update-on-410 trigger. | Pre-call: grep output shown in the audit (map-hit / map-miss). Post-call: map-update PR when 410/301 lands, with "Map drift log" row recording old-path + redirect-doc + drift-date. Cadenced: sweep output logged to `docs/hygiene-history/surface-map-drift-history.md` (per-fire schema per row #44). ROUND-HISTORY row when a drift resolves. | `memory/feedback_surface_map_consultation_before_guessing_urls.md` (authoritative) + `docs/research/github-surface-map-complete-2026-04-22.md` (primary target for GitHub surfaces) + `docs/AGENT-GITHUB-SURFACES.md` (ten-surface playbook) + `docs/HARNESS-SURFACES.md` + `docs/GITHUB-SETTINGS.md` + this row's enforcement discipline (agent-self-administered pre-call, detection scripts TBD under `tools/hygiene/audit-surface-map-drift.sh`) |
| 48 | Cross-platform parity audit (bash / PowerShell / bun+TS twin check across macOS / Windows / Linux / WSL) | Detect-only now (landed 2026-04-22); cadenced detection every 5-10 rounds (same cadence as row #46); opportunistic on-touch every time an agent adds or edits a script under `tools/`. Enforcement deferred until baseline is green AND CI matrix runs `--enforce` on `macos-latest` / `windows-latest` / `ubuntu-latest` (WSL inherits ubuntu-latest for CI). | Dejan (devops-engineer) on cadenced detection; author of the script (self-check at author-time against the rule classes in the audit's decision-record header block). Kenji (Architect) on CI-matrix-enforcement sign-off when baseline is green. | both | `tools/hygiene/audit-cross-platform-parity.sh` classifies every script under `tools/` by rule class: (a) **pre-setup** (`tools/setup/**`) — both `.sh` AND `.ps1` required per Q1 dual-authoring rule (`memory/feedback_preinstall_scripts_forced_shell_meet_developer_where_they_live`); (b) **post-setup permanent-bash** (`thin wrapper over existing CLI` / `trivial find-xargs pipeline` / `stay bash forever`) — `.ps1` twin required per the Windows-twin obligation (`memory/feedback_stay_bash_forever_implies_powershell_twin_obligation.md`); (c) **post-setup transitional** (`bun+TS migration candidate` / `bash scaffolding`) — no twin obligation (long-term plan is one cross-platform bun+TS script); (d) **post-setup bun+TS** (`*.ts` under `tools/`) — no twin needed (cross-platform native via bun). `--summary` prints counts; `--enforce` flips exit 2 on gaps. **Why detect-only first:** baseline at first fire (2026-04-22) was 13 gaps — 12 pre-setup bash without `.ps1` twin (Q1 violation silently accumulating since `tools/setup/` existed) + 1 post-setup permanent-bash (`tools/profile.sh`) without `.ps1` twin. Turning enforcement on before triage would block every CI run. **Why this row exists:** Aaron 2026-04-22 *"missing mac/windows/linux/wsl parity (ubuntu latest) we can deffer but should have the hygene in place for when we want to enforce and it will be more obvious to you in the future that we are cross platform."* Cross-platform-first must be a *visible* factory property (audit exists, runs, prints the gap) before it becomes an enforced gate. Same pattern as FACTORY-HYGIENE rows #23 / #43 / #47. See `memory/feedback_cross_platform_parity_hygiene_deferred_enforcement.md`. **Classification (row #47):** **prevention-bearing** — the audit runs at author-time (opportunistic on-touch) and surfaces the gap before it lands, same as row #46. The audit itself is a detect-only mechanism but detect-only surfaces the obligation at author-time when the author runs it. Ships to project-under-construction: adopters inherit the parity audit + the decision-record-block pattern + the CI-matrix obligation once it's wired. | Audit output in repo root on each fire; cadenced runs appended to `docs/hygiene-history/cross-platform-parity-history.md` (per-fire schema per row #44); BACKLOG row per gap at triage time; ROUND-HISTORY row when a gap resolves. | `tools/hygiene/audit-cross-platform-parity.sh` (detection + decision-record header block) + `memory/feedback_cross_platform_parity_hygiene_deferred_enforcement.md` + `memory/feedback_stay_bash_forever_implies_powershell_twin_obligation.md` + `memory/feedback_preinstall_scripts_forced_shell_meet_developer_where_they_live` + `docs/POST-SETUP-SCRIPT-STACK.md` |

## Ships to project-under-construction
Expand Down
Loading
Loading