diff --git a/docs/backlog/P1/B-0083-atari-2600-rom-canonical-naming-tosec-goodtools-tooling-aaron-2026-04-28.md b/docs/backlog/P1/B-0083-atari-2600-rom-canonical-naming-tosec-goodtools-tooling-aaron-2026-04-28.md new file mode 100644 index 00000000..f3f8a7e2 --- /dev/null +++ b/docs/backlog/P1/B-0083-atari-2600-rom-canonical-naming-tosec-goodtools-tooling-aaron-2026-04-28.md @@ -0,0 +1,325 @@ +--- +id: B-0083 +priority: P1 +status: open +title: Atari 2600 ROM canonical-naming + safe-vs-unsafe folder split + TOSEC/Good-Tools-style hash-lookup tooling +tier: factory-tooling +effort: M +ask: maintainer Aaron 2026-04-28 (autonomous-loop ROM-drop + canonical-naming request) +created: 2026-04-28 +last_updated: 2026-04-28 +tags: [aaron-2026-04-28, roms, atari-2600, tosec, good-tools, canonical-naming, datfile, license-safety, gitignore-already-protects, high-priority-after-0-0-0, scheduled-after-0-0-0] +--- + +# B-0083 — Atari 2600 ROM canonical-naming + tooling + +## Source + +Aaron 2026-04-28T18:55Z verbatim: + +> *"I just put a bunch of messy roms int your Atari 2600 folder, can you +> organize them connonicaly for easy finding, feel free to spend some +> time doing research, these folders should be ignored and not checked +> in but you can reference any of them you want in readmes as games you +> can test out locally, we don't want to distribute atari roms, we can +> manually selelct check in any that won't get us in trouble for sharing +> on github for any that's follow those rules and licenses. we maybe need +> different non git ignored folder for those special ones that are safe. +> Please name them all connonically and probably does not need to be +> zipped, there are tools that can help you likely with naming them, like +> to tosec and good tools, i don't know if they run on mac but you can +> look at their source to figure out how it workds it's proabaly some +> database file you can check agants sha or md5s or sometime based on the +> latest list. Lets make sure we can replicate that functionalty here on +> list updates if that's how it works, we can backlog this but hight +> priortiy right after the 0/0/0 starting point"* + +## Schedule + +**Scheduled trigger**: 0/0/0 AceHack-LFG hard-reset complete (the +hard-reset chain is the gating dependency; see PR #677 5-disciplines +and pull-queue work). + +## Ownership rationale (Aaron verbatim 2026-04-28T18:58Z) + +> *"basically some roms i own becasue i bought the same i can share +> with you locally but we can't check into git, only certain ones are +> license safe or it's expired or whatever. those can get checked in, +> the more realish games will only be on local maintainers computers +> and each will likely have their own set."* + +This articulates the established personal-use vs distribution legal +boundary: + +- **Aaron owns the ROMs (bought them)** → personal-use copies are + legal between him and the local agent on his machine. +- **Distribution via git** would create a redistribution path → only + license-cleared ROMs (public domain, homebrew with permissive + license, copyright-expired, explicit-license-commercial) can ship + in the tracked `roms-safe/` folder. +- **Per-maintainer local ROM sets**: the gitignored `roms/` folder + is local to each maintainer's machine. Each maintainer will have + their own set based on what they personally own. +- **The safe folder is the SHARED canonical surface**: only ROMs that + every maintainer can legally use (regardless of what they personally + own) live in `roms-safe/`. + +This split is exactly what the existing `roms/.gitignore` + +`roms/atari/2600/README.md` license-safety-gate enforces. B-0083 +operationalizes the split by adding the safe-folder + the tooling. + +## Current state (verified 2026-04-28T18:53Z) + +- **3461 files** in `roms/atari/2600/` — mix of `.bin` and `.zip`, + some Good-Tools-style canonical names (`Title (Year) (Publisher) + [!]`), some uncanonical (`Jammed.bin`, `seantsc.bin`). +- **`roms/.gitignore` already fully protects** with depth-limited + pattern (`*` + `!*/` + `!/README.md` + `!/*/README.md` + + `!/*/*/README.md`). The 3461 ROMs are NOT at risk of accidental + commit. README.md is the only tracked file. **No emergency action + required.** +- **`roms/atari/2600/README.md` already documents the license-safety + gate**: PD / homebrew-with-permissive-license / official-test-ROMs / + commercially-released-as-free / explicit-license-commercial = SAFE; + uncertain provenance defaults to FORBIDDEN. + +## Why high-priority + scheduled-after-0/0/0 + +Aaron's explicit verbatim: *"we can backlog this but hight priortiy +right after the 0/0/0 starting point"*. Decoded: + +- Priority: P1 (high) +- Trigger: AceHack-main and LFG-main reach 0/0/0 (per the hard-reset + audit already in flight; see PR #677's 5-disciplines memory + the + pull-queue work this session). +- NOT before 0/0/0 — the substrate cleanup is the blocking + dependency. + +## Background research (TOSEC + Good Tools) + +### TOSEC ("The Old School Emulation Center") + +- Datfile format: XML, one `` per platform. +- Each `` entry has `` children with `name`, `size`, + `crc`, `md5`, `sha1` attributes. +- Canonical naming convention (TOSEC Naming Convention 2015 / TNC15): + `Title version (Demo) (Date)(Publisher)(System)(Video)(Country)(Language)(Copyright)(Devstatus)[Cracked][Trainer][Hacked][Modified][Pirated][Bad][Verified][More Info]` +- Datfiles available from: + GitHub + mirrors (e.g. `TheTOSECteam` org). +- Atari 2600 platform-slug: `Atari 2600 (TOSEC)`. + +### Good Tools (a.k.a. GoodSets) + +- Created by Cowering. Discontinued ~2011 but datfiles remain + authoritative for older ROM-set hash matching. +- Atari 2600 set: `GoodA26` (typically `GoodA26 v3.27` or similar). +- Naming convention: `Title (Year) (Publisher) [code]` where code is + `[!]` (verified good), `[a1]` (alternate), `[b1]` (bad dump), + `[h1]` (hack), `[o1]` (overdump), `[t1]` (trainer), `[T+ENG]` + (English translation), etc. +- Source for GoodA26: search GitHub for `GoodTools-A26` mirrors; + the original distribution was `GoodA26.zip` containing the EXE + + datfile. + +### Algorithm (replicated in factory tooling) + +1. **For each file in `roms/atari/2600/`**: + - Extract: if `.zip`, get the inner ROM bytes. + - Compute SHA1 + MD5 + CRC32 of the ROM bytes. +2. **Lookup in datfile**: + - Match by SHA1 first (most discriminating). + - Fall back to MD5 → CRC32 if SHA1 absent. + - If no match: file is unrecognized; flag for manual review. +3. **Rename to canonical form**: + - Use TOSEC TNC15 OR Good-Tools convention (pick one as + factory standard; recommend TOSEC for ongoing maintenance + since GoodTools is discontinued). +4. **Classify license-safety**: + - Match against the README's permitted classes (PD, homebrew, + official-test, commercially-released-as-free, explicit-license). + - The TOSEC `` element's `` and + `` fields sometimes carry license metadata; if not, + fall back to a curated allowlist (e.g. + `tools/roms/manifests/atari-2600-homebrew-allowlist` (no-extension manifest per the `tools/setup/manifests/uv-tools` convention)). +5. **Move to safe vs unsafe folder**: + - Safe: a NEW non-gitignored tracked folder + (`roms-safe/atari/2600/` or similar). + - Unsafe: stays in `roms/atari/2600/` (gitignored). +6. **README cross-references**: + - Update `roms/atari/2600/README.md` to list the safe ROMs + by canonical name with a one-line summary + license citation. + - Optional: list the unsafe ROMs available locally (filenames + only, no distribution) so future-Otto knows what's available + for testing. + +## Tooling design — dependency-first as bridge; build-our-own as end goal (Aaron 2026-04-28T18:59Z + 19:00Z) + +Aaron verbatim 18:59Z: *"TOSEC/Good we can pull as dependences too and +use the same consume goodcitizen staces as all of our other dependencies +i just don't know if these are cross platform."* + +Aaron sharpened verbatim 19:00Z: *"build-our-own as last resort. our +good citizen is because our end goal is we build all of our dependncies +but still contribute back our enhancements and such"* + +The trajectory (per +`memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md` +end-goal sharpening): + +1. **Bridge phase**: pull TOSEC/Good Tools (or cross-platform + equivalent) as a dependency. Use the established consume-good- + citizen pattern. Contribute back enhancements while we use it. +2. **Build-our-own phase** (eventual): factory builds its own + datfile-driven ROM-namer. This is the end goal, not a fallback. +3. **Contribution-back continues** even after build-our-own lands — + peer-maintainer status survives our own implementation. + +For B-0083 specifically, we're at step 1 (bridge phase). The +preferred immediate path is dependency-first; build-our-own is +explicitly the trajectory direction, not a panic-fallback. + +### Cross-platform tool research (preliminary, expand on pickup) + +- **TOSEC reference tools** (`clrmamepro`, `tosec-cli`): + - `clrmamepro` is Windows-only (no Mac/Linux build). + - `tosec-cli` is .NET Framework — Mac-via-mono possible but flaky. + - **Likely NOT directly usable on Mac.** +- **GoodTools** (Cowering): + - Windows-only EXE distributions, discontinued ~2011. + - **Not cross-platform.** +- **RomVault** (`https://github.com/gjefferyes/RomVault`, .NET 6+, cross-platform): + - Confirmed Mac/Linux support via dotnet runtime. + - Mature ROM-management tool, datfile-driven. + - **Strong candidate for primary dependency.** +- **Romulus** (Java, cross-platform JVM): + - JVM-based, runs on Mac. + - Older project; less active. +- **retool** (`https://github.com/unexpectedpanda/retool`, Python, cross-platform): + - Active Python project for ROM filtering by datfile. + - Routed through factory uv-managed pipx (NEVER raw `pip install` — see + `docs/DECISIONS/2026-04-27-uv-canonical-python-tool-manager.md` for + the canonical Python-tool-manager decision); Mac-friendly. + - **Strong candidate for scripting integration.** +- **Mednafen toolchain** + custom datfile-parsing scripts. + +### Recommended approach (pickup-time decision tree) + +1. **Try RomVault first** (managed via mise / dotnet pin per + the existing dependency-consumption pattern). If it works + cross-platform and we can drive it headlessly, this is + the cleanest path. +2. **Fall back to retool** (Python tool, routed through + factory uv-managed pipx — NEVER raw `pip install`) if + RomVault doesn't headless-script cleanly. +3. **Fall back to build-our-own** (pure-Python in `tools/roms/`) + ONLY if neither above tool fits the factory shape. This is + the last resort — the algorithm is straightforward + (SHA1/MD5/CRC32 lookup against datfile XML) but maintaining + our own datfile-parser is ongoing-work we'd rather not own. + +### Datfile-as-dependency + +Either path needs the actual TOSEC + GoodA26 datfiles. Approach: + +1. Pin the datfile version in our dependency-manifest (similar + to how `.mise.toml` pins runtime versions). +2. Download from canonical sources (TOSECdev.org / archived + GoodSets mirrors). +3. Refresh on a cadence (similar to budget-snapshot-cadence) — + when TOSEC publishes a new datfile, re-pin + re-run. +4. Verify via SHA256 of the datfile itself per the + pin-with-checksum pattern (Otto-247 + the threading-lineage + citing-discipline). + +### Build-our-own fallback (only if dependency path fails) + +- **Live in `tools/roms/`** (new factory tooling subdirectory). +- Pure-Python or pure-bash; no external runtime beyond what + `.mise.toml` already pins. +- Algorithm: + 1. Download TOSEC datfile (XML) from versioned URL. + 2. Parse XML, build `dict[sha1] = (canonical_name, + license_class, ...)`. + 3. For each file in `roms/atari/2600/`, compute SHA1, lookup, + rename + move per classification. + 4. Refresh-on-demand: re-pull datfile, re-run against folder. +- **Schedule via existing GHA cadence** (similar to + budget-snapshot-cadence pattern) for periodic datfile refresh. + +### Good-citizen contribution path + +Per `memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md`: +when we use TOSEC/RomVault/retool, we contribute back. Specifically: + +- Bug reports for Mac-specific issues we hit. +- Documentation improvements if their docs missed something. +- New datfile entries we generate (e.g. for safe-folder homebrew + ROMs Aaron classifies). +- Financial support (small-donor tier) if the project accepts it + per Aaron's funding posture (`feedback_absorb_and_contribute_*`). + +## Folder structure proposal + +``` +roms/ (gitignored, bulk) +├── .gitignore (existing depth-limited rule) +├── README.md (existing top-level protocol) +└── atari/2600/ (existing — 3461 ROMs gitignored) + └── README.md (existing safety-gate) + +roms-safe/ (NEW, tracked — license-verified ROMs) +├── README.md (NEW — explains the split + cites licenses) +└── atari/2600/ (NEW) + ├── README.md (NEW — per-ROM citations) + └── *.bin (NEW — only ROMs cleared by license) +``` + +## Acceptance criteria + +- [ ] All 3461 files in `roms/atari/2600/` renamed to canonical + form (TOSEC or Good-Tools — pick one) where matchable. +- [ ] Unmatchable files flagged in a manual-review list. +- [ ] License-classification applied per the README's gate. +- [ ] License-cleared ROMs moved to `roms-safe/atari/2600/` + with per-ROM license citation. +- [ ] License-uncertain ROMs stay in `roms/atari/2600/` + (gitignored, never distributed). +- [ ] Tooling lives in `tools/roms/` and refreshes on TOSEC + datfile updates (manual-trigger workflow at minimum; + scheduled cron optional). +- [ ] `roms/atari/2600/README.md` updated to reference the + safe ROMs as "games you can test out locally" per Aaron's + framing. +- [ ] Otto-247 version-currency: WebSearch for the latest + TOSEC Atari 2600 datfile version before asserting which + one to use. + +## Composes with + +- `roms/.gitignore` — already protects against accidental commit. +- `roms/README.md` + `roms/atari/2600/README.md` — already + document the license-safety gate. +- B-0061 (monolith-to-per-row migration) — adjacent factory- + hygiene class. +- Otto-247 version-currency — for datfile version handling. +- Otto-275-YET — Aaron's explicit log-don't-implement signal + ("we can backlog this"). +- The 0/0/0 hard-reset work in flight (PR #677 5-disciplines + + the pull-queue audit) — gating dependency. + +## Future-Otto pickup notes + +When 0/0/0 lands: + +1. Re-read this row + `roms/README.md` + `roms/atari/2600/README.md`. +2. Run an Otto-247 WebSearch for "TOSEC Atari 2600 datfile latest 2026" + to get the current version. +3. Pick TOSEC (recommended for active maintenance) over Good Tools. +4. Build the tooling in `tools/roms/` (Python likely; bash for shell- + integration; either is acceptable per existing factory polyglot + discipline). +5. Apply per the algorithm in this row. +6. Spot-check 5-10 ROM renames before mass-applying (per the + 5-pre-flight-disciplines verify-substrate-state pattern). +7. Cross-CLI verify (Otto-347) on the license-classification logic + — getting that wrong has legal blast radius. diff --git a/docs/backlog/P3/B-0084-codeql-path-gate-empty-sarif-aggregate-baseline-verify-coverage-aaron-2026-04-28.md b/docs/backlog/P3/B-0084-codeql-path-gate-empty-sarif-aggregate-baseline-verify-coverage-aaron-2026-04-28.md new file mode 100644 index 00000000..4aa00b63 --- /dev/null +++ b/docs/backlog/P3/B-0084-codeql-path-gate-empty-sarif-aggregate-baseline-verify-coverage-aaron-2026-04-28.md @@ -0,0 +1,179 @@ +--- +id: B-0084 +priority: P3 +status: mostly-implemented-verify-coverage +title: Verify CodeQL path-gate empty-SARIF aggregate-baseline covers all matrix languages (already implemented, verify-only scope) +tier: factory-tooling +effort: S +ask: maintainer Aaron 2026-04-28 (SASTID alert investigation — speculation-vs-evidence + do-the-right-long-term-thing corrections) +created: 2026-04-28 +last_updated: 2026-04-28 +tags: [aaron-2026-04-28, scorecard, sastid, codeql, path-gate, do-the-right-long-term-thing, mostly-implemented, verify-only, P3-downgraded-from-P1-on-finding-already-done] +--- + +# B-0084 — CodeQL path-gate emit-empty-SARIF for Scorecard SAST coverage + +## Source + +Aaron 2026-04-28T19:01Z verbatim: + +> *"SASTID dismissed ✅ did you fix what it was complaining about?"* + +Aaron 2026-04-28T19:02Z verbatim: + +> *"it also voilates do the right long term thing when making suggested fixes"* + +Two compounding corrections caught my dismissal as the wrong move: + +1. The dismissal was speculation-without-evidence (I asserted the 2/30 + unchecked were "path-gate-skipped doc-only" without verifying). +2. Even if the speculation had been correct, dismissal-with-rationale + is a short-term avoidance vs. fixing the root cause. Violates + do-the-right-long-term-thing. + +## What the speculation-check actually showed + +After reversing the dismissal and investigating: + +- **PR #651** ("CI cadence split + Windows trajectory seed"): 32 files + touched including \`.github/workflows/codeql.yml\` itself. This is + the commit that introduced the path-gate. Path-gate may have been + effectively no-op before this PR; CodeQL legs may have been absent + for transitional reasons. +- **PR #654**: 2 memory files only. After path-gate landed, this + correctly skipped CodeQL (no code to scan). + +So the path-gate IS working as designed. The failure isn't the gate — +it's that Scorecard's SASTID metric counts "did the github-code-scanning +app log a SAST run" per commit, and path-gate-skipped commits register +as "SAST didn't run". + +## Update 2026-04-28T19:09Z: pattern already implemented + +After deeper investigation: `.github/workflows/codeql.yml` lines 53-65, +121-180, 241-334 ALREADY implement this pattern. The path-gate job +emits no-findings SARIF per language category when no code changed. +**The current SASTID 28/30 is a timing artifact**, not a missing fix: + +- The alert was created 2026-04-27T23:52:55Z. +- The path-gate became fully active around PR #651 (which itself + modified codeql.yml). +- Scorecard's window of 'recent 30 merged PRs' currently includes + pre-path-gate commits, hence the gap. +- As more post-path-gate PRs land, the metric self-heals. + +**Lower-priority than initially scoped.** The trajectory is now +captured durably as substrate (see +`memory/feedback_emit_empty_security_result_on_conditional_skip_ci_maturity_pattern_aaron_2026_04_28.md`) +so future security-tool workflows inherit the pattern. The specific +codeql.yml work is DONE; the timing-artifact resolves on its own. + +What remains in scope for B-0084: + +- Verify the path-gate aggregate-baseline covers ALL matrix languages + (currently: actions + csharp + python + java-kotlin + javascript- + typescript per round-34 update). +- If a future language addition misses the aggregate-baseline, this + row catches it. + +## The original fix shape (preserved for context) + +When path-gate determines no code changes, **still upload an empty +SARIF** via \`codeql-action/upload-sarif@\`. This makes GitHub's +Code Scanning surface log "SAST ran with zero findings" for that +commit, which Scorecard then counts as SAST-covered. + +This is the documented pattern for Scorecard satisfaction without +burning Actions minutes on prose-only changes. The empty SARIF is +~50 bytes; the upload-sarif step takes ~5 seconds; net Actions cost +is negligible. + +## Concrete change + +In \`.github/workflows/codeql.yml\`, after the path-gate "code_changed" +output is computed: + +```yaml +- name: Path gate decision + id: path-gate + run: | + # ... existing logic that sets code_changed=true|false ... + echo "code_changed=$code_changed" >> "$GITHUB_OUTPUT" + +# NEW: emit empty SARIF when path-gate skips, so Scorecard SAST +# coverage stays at 30/30 instead of dropping on doc-only PRs. +- name: Emit empty SARIF on no-code-change + if: steps.path-gate.outputs.code_changed != 'true' + run: | + cat > empty.sarif <<'EOF' + { + "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", + "version": "2.1.0", + "runs": [{ + "tool": {"driver": {"name": "codeql-path-gate-noop", "version": "1.0.0"}}, + "results": [] + }] + } + EOF + +- name: Upload per-language empty SARIF to Code Scanning + if: steps.path-gate.outputs.code_changed != 'true' + strategy: + matrix: + language: [actions, csharp, python, java-kotlin, javascript-typescript] + uses: github/codeql-action/upload-sarif@ + with: + sarif_file: empty.sarif + category: "/language:${{ matrix.language }}" +``` + +**Important**: the live `codeql.yml` implementation uses **per-language +SARIF categories** (one upload per `Analyze (X)` matrix leg) rather +than a single aggregate category. Reason: the +`code_quality:severity=all` ruleset rule reads SARIF coverage +per-language; a single-category upload would still leave 4 of 5 +language legs as "results pending". See lines 270-334 of the live +workflow for the actual matrix-loop implementation. + +## Acceptance criteria + +- [ ] `.github/workflows/codeql.yml` modified to emit empty SARIF + on path-gate skip +- [ ] Empty SARIF passes GitHub Code Scanning validation +- [ ] After the next 2-3 doc-only PRs land, Scorecard SASTID metric + reads 30/30 (or whatever the recent-N-commits ratio shows) +- [ ] Per-PR Actions minutes cost increases by ~5 seconds (empty- + SARIF upload), within budget +- [ ] Otto-247 version-currency: upload-sarif action SHA-pinned to + latest stable (not `@v3` or similar) + +## Why P1 (not deferred to after 0/0/0) + +- The Scorecard SASTID alert is gating PR #661 (`code_quality:severity=all` + ruleset requires zero open code-scanning alerts). +- This unblocks the 0/0/0 chain rather than being blocked by it. +- Small effort (S = under a day; the change is ~15 lines of YAML). + +## Composes with + +- `feedback_speculation_leads_investigation_not_defines_root_cause_*` — + the failure mode this row corrects (I dismissed without + verifying). +- `feedback_destructive_git_op_5_pre_flight_disciplines_*` — discipline + 3 (commit messages / process metrics count as content) compose + with: process metrics count as ALERTS, dismissal isn't a fix. +- The "do the right long term thing" framing Aaron just named — + worth landing as its own memory after this PR. + +## What I learned (this row's meta-lesson) + +1. **Dismissing security alerts as "won't fix" is a code smell** when + the alert points at a real metric gap, even if the metric is + "process" rather than "code-vuln". +2. **Speculation-without-evidence still kicks in even on small calls.** + I'd asserted "path-gate correctly skipped" without checking the 2 + actual unchecked PRs. Aaron's question forced the investigation. +3. **Short-term avoidance ≠ long-term fix.** Dismissal closes the + alert in this scan but lets the metric drop on every doc-only PR + thereafter. Emit-empty-SARIF is one-time work that fixes the + underlying signal-quality forever. diff --git a/memory/MEMORY.md b/memory/MEMORY.md index d7b2a1cc..5bc25b79 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -2,6 +2,8 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-29 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler.) +- [**Self-healing metrics on regime change — factory design principle (Aaron 2026-04-28)**](feedback_self_healing_metrics_on_regime_change_factory_design_principle_aaron_2026_04_28.md) — When system is correctly designed, transient metric gaps from regime transitions resolve organically as new regime accumulates evidence in rolling window. Prefer self-heal over manual rebaseline. NOT applicable when system is broken (verify first). +- [**Emit empty security-tool result on conditional-skip — CI security-maturity pattern (Aaron 2026-04-28)**](feedback_emit_empty_security_result_on_conditional_skip_ci_maturity_pattern_aaron_2026_04_28.md) — Trajectory: when security-tool workflow skips (path-gate, branch-filter, etc.), STILL emit minimal no-findings result so coverage metrics see tool-ran. Already in codeql.yml; propagate to Semgrep/dep-scan/container-scan as added. - [**Elizabeth-canonical-spelling §33 carve-out for sister-name (Aaron 2026-04-28)**](feedback_elizabeth_canonical_spelling_overrides_section_33_history_preservation_aaron_2026_04_28.md) — Replace older-spelling tokens with canonical Elizabeth repo-wide including history surfaces. Name-specific; does not generalize. - [**Five pre-flight disciplines for destructive git operations (Codex+Gemini caught 5 risks Otto missed; Aaron 2026-04-28)**](feedback_destructive_git_op_5_pre_flight_disciplines_codex_gemini_2026_04_28.md) — Tree-diff ≠ history preservation; timestamp-newer ≠ subsumption proof; commit messages / PR refs / AgencySignature provenance count as content; --force-with-lease=ref:exact-sha (not bare --force); freshness via fresh-fetched refs. 6-box checklist before any destructive git op. Composes with Otto-347. - [**Threading code follows MS Learn + Albahari + Toub + Fowler — never gut-instinct (Aaron 2026-04-28)**](feedback_threading_human_lineage_albahari_toub_fowler_no_gut_instinct_aaron_2026_04_28.md) — Four-source lineage. MS Learn advanced .NET docs first (.NET-10-current; replaces some Albahari guidance), Albahari foundational but old (2011), Toub (perf), Fowler (Channels). Worked example: `System.Threading.Lock` supersedes `lock(object)` for .NET 9+/C# 13+. Modern .NET 10 absorb (Gemini Pro Deep Research) at `docs/research/2026-04-28-gemini-pro-deep-research-threading-net10-csharp14-modernization.md`. Prefer wait-free / lock-free. diff --git a/memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md b/memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md index 62411c0e..5e3f0d4e 100644 --- a/memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md +++ b/memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md @@ -148,6 +148,46 @@ better discipline that the block was protecting the space for. first-class factory asset — the footprint of our upstream participation. +## End-goal sharpening (Aaron 2026-04-28T19:00Z) + +> *"build-our-own as last resort. our good citizen is because our +> end goal is we build all of our dependncies but still contribute +> back our enhancements and such"* + +This sharpens the discipline. The previous framing positioned +absorb-and-contribute as the right way to consume community tools +indefinitely. Aaron's 2026-04-28 clarification reframes it as a +**transitional state** with a clearer end goal: + +- **End goal: factory-built-everything.** Long-term, the factory + builds all of its own dependencies. This is the autonomy axis — + the factory should not ship dependent on third-party tooling + for its core capabilities. +- **"Good citizen" is what we do DURING the bridge.** Use community + tools as bridges; contribute back our enhancements; eventually + replace with factory-built versions. +- **Contribution-back continues after the bridge ends.** Even after + the factory has built its own version of capability X, we keep + contributing relevant enhancements to the upstream community + project — because (a) the community project still serves users + who don't need factory-grade implementation, and (b) our + contributions back are themselves a moat-building signal + (peer-maintainer status survives our own implementation). +- **"Build-our-own as last resort" reads as last-resort-FOR-NOW.** + We don't immediately reach for build-our-own when a community + tool exists; we use it, contribute back, and trajectory toward + building our own when the time is right (e.g., when the + community tool has structural limitations, when we need + factory-grade integration, when upstream stalls). + +The trajectory: **community-tool → absorb-and-contribute → +factory-built + ongoing-contribution-back**. Three phases, not two. + +The contribution-back across all three phases is what "good +citizen" actually means. It's not just "use polite community +practices while we depend on them"; it's "continue giving back +to the commons across the trajectory of our own self-sufficiency". + ## Composition with existing memory - `feedback_external_signal_confirms_internal_insight_second_occurrence_discipline_2026_04_22.md` diff --git a/memory/feedback_emit_empty_security_result_on_conditional_skip_ci_maturity_pattern_aaron_2026_04_28.md b/memory/feedback_emit_empty_security_result_on_conditional_skip_ci_maturity_pattern_aaron_2026_04_28.md new file mode 100644 index 00000000..96dbc3df --- /dev/null +++ b/memory/feedback_emit_empty_security_result_on_conditional_skip_ci_maturity_pattern_aaron_2026_04_28.md @@ -0,0 +1,146 @@ +--- +name: Emit empty security-tool result on conditional-skip — CI security-maturity pattern (Aaron 2026-04-28) +description: Aaron 2026-04-28T19:08Z: 'probably just need some CI maturity vector maybe we already have' — confirming this is a substrate-level CI-security-maturity trajectory, not just a one-off backlog item. The pattern: when a security-tool workflow conditionally skips (path-gate, branch-filter, file-filter, fork-PR-permissions guard), STILL emit a minimal no-findings result so coverage metrics (Scorecard, code-scanning rulesets, audit dashboards) see "tool ran on this commit" rather than "tool didn't run". Generalizes beyond CodeQL to Semgrep, dependency-scan, container-scan, license-scan. Zeta already implements this in codeql.yml (path-gate emits empty SARIF per language category); the trajectory says: apply this pattern to every security-tool workflow we add. +type: feedback +--- + +# Emit empty security-tool result on conditional-skip — CI security-maturity pattern + +## The rule (Aaron verbatim 2026-04-28T19:08Z) + +> *"That's how mature CI security pipelines satisfy Scorecard without +> burning Actions minutes on prose-only changes. sound like we should +> capture this as our trajectory? or is it just a small backlog item, +> or are you fixing it now?"* + +> *"probably just need some CI maturity vector maybe we already have"* + +The two messages together: yes, capture as trajectory. Yes, we already +have most of it. + +## The pattern + +When a security-tool workflow conditionally **skips** running the +actual analysis on a given commit (because path-gate determines no +code changed, branch-filter excludes the branch, file-filter +excludes the changeset, fork-PR-permissions block tool, etc.), the +workflow MUST STILL **emit a minimal no-findings result** so: + +1. **Coverage metrics stay high**: Scorecard's SAST coverage ratio, + GitHub's `code_scanning` ruleset rule ("Require code scanning + results"), org-level security dashboards all count "tool ran on + commit X with zero findings" as covered. +2. **Audit trail is uniform**: every commit shows up in the + security-tool's run-history; no implicit-skip gaps that look + like "did we forget to scan this?". +3. **Required-status-check rulesets pass**: when the rule requires + the SAST status check be present, the empty-result satisfies + the gate without blocking doc-only PRs on full analysis. + +## The cost-benefit + +- **Cost**: ~5 seconds per skipped PR (synthesize ~50-byte SARIF + + upload via `codeql-action/upload-sarif`). +- **Benefit**: every coverage metric stays at 100%; no false + alarms from process-metrics (Scorecard SASTID etc.); merge-gating + rulesets pass on doc-only PRs without burning full-scan minutes. + +## Where Zeta already implements this + +`.github/workflows/codeql.yml` (round-34-tuned, see comment block +lines 53-65 + 67-74): + +- `path-gate` job runs first; sets `code_changed` output. +- If `code_changed=false` (pure docs / memory / .claude PR): + - **Aggregate-CodeQL baseline** step (lines 241+): synthesizes + minimal no-findings SARIF per language category and uploads + via `github/codeql-action/upload-sarif@`. + - The `analyze` matrix is skipped (no expensive DB-build). +- Fork-PR guard: forces `code_changed=true` on external-contributor + PRs because their downgraded `GITHUB_TOKEN` permissions can't + do `security-events: write` for synthetic SARIF upload. + +## Where the pattern should propagate + +When the factory adds new security-tool workflows, apply the same +shape: + +- **Semgrep workflow** (currently on every PR via `lint (semgrep)`): + if we ever add a path-gate to skip docs-only PRs, the path-gate + must emit an empty Semgrep SARIF or equivalent. +- **Dependency-scan workflow** (Dependabot, OSV-Scanner, future + additions): if branch-filter or path-filter skips, emit empty + result. +- **Container-scan workflow** (future, if we ship containers): + if no Dockerfile changed, emit empty result. +- **License-scan workflow** (future): if no dependency changed, + emit empty result. +- **CodeQL itself** when new languages are added to the matrix: + the `path-gate` aggregate-baseline must emit one no-findings + SARIF per matrix language (already covered for `actions` / + `csharp` / `python` / `java-kotlin` / `javascript-typescript`). + +## When this trajectory MIGHT NOT apply + +- **Tool doesn't produce SARIF or equivalent uploadable artifact**: + some security tools log to stdout only. For these, the empty- + result pattern requires a wrapper that synthesizes the upload + format. Skip the empty-emit if the wrapper cost exceeds the + metric-gain. +- **Tool's run-history is not coverage-metric-checked**: if no + external system (Scorecard / ruleset / dashboard) checks + per-commit coverage for that tool, the empty-emit is unnecessary + ceremony. Apply YAGNI. +- **Skip is intentional non-applicability**: e.g., a Windows-only + tool on a Linux-only PR. Document the skip in the workflow's + decision step instead of emitting a misleading "ran with no + findings" signal. + +## Composes with + +- `feedback_destructive_git_op_5_pre_flight_disciplines_codex_gemini_2026_04_28.md` + — discipline 1 (history preservation): the empty-SARIF upload + IS the artifact that makes the security-tool's history + preservation complete (no per-commit gaps). +- `feedback_codeql_umbrella_neutral_vs_per_language_detection_pattern_aaron_2026_04_28.md` + — same family: code_scanning ruleset rule expects per-language + SARIF; missing one = NEUTRAL = ruleset blocks. Empty-SARIF on + conditional skip = ruleset passes. +- Otto-247 version-currency: when bumping `codeql-action/upload-sarif` + pin, check the latest version per the WebSearch discipline. +- B-0084 (the concrete-instance backlog row) — captured for the + specific Scorecard SASTID metric. + +## What this is NOT + +- **NOT a directive to run every tool on every commit.** That would + burn Actions minutes for zero security value on doc-only PRs. + The pattern is "skip the analysis, emit the receipt". +- **NOT a generic CI-pattern-of-the-month.** This applies + specifically to **security-tool** workflows where coverage + metrics matter. Build/test workflows have their own conditional- + skip patterns and don't need the empty-result emit. +- **NOT applicable to the maintainer's own scripts** (e.g., + tools/budget/snapshot-burn.sh). Those have their own success- + on-no-change semantics. +- **NOT a substitute for actual security analysis.** When code + changes, run the real tool. The empty-emit is for the + no-code-changed case only. + +## Pickup notes for future-Otto + +When adding a new security-tool workflow: + +1. **Decide if conditional-skip applies**: does the tool make + sense to skip on certain commits (path-filter, branch-filter)? +2. **If yes, design the skip path with empty-emit**: + - Synthesize a minimal valid result format (SARIF for + code-scanning; tool-specific format otherwise). + - Upload via the canonical action (`codeql-action/upload-sarif`, + etc.). + - Document the skip-with-emit shape in the workflow's + comment block. +3. **If no (tool always runs)**: no special handling. +4. **Cross-CLI verify (Otto-347)** the skip-path on first + implementation — getting the SARIF schema wrong silently + succeeds at upload but fails at the metric check. diff --git a/memory/feedback_self_healing_metrics_on_regime_change_factory_design_principle_aaron_2026_04_28.md b/memory/feedback_self_healing_metrics_on_regime_change_factory_design_principle_aaron_2026_04_28.md new file mode 100644 index 00000000..28df076c --- /dev/null +++ b/memory/feedback_self_healing_metrics_on_regime_change_factory_design_principle_aaron_2026_04_28.md @@ -0,0 +1,141 @@ +--- +name: Self-healing metrics on regime change — factory design principle (Aaron 2026-04-28) +description: Aaron 2026-04-28T19:09Z — 'the metric self-heals. i love self healing' + 'sounds like a good thing to remember'. Generalizable factory-design principle: when a system is correctly designed, transient metric gaps from regime transitions (path-gate now-active, fix landed, new tool added) resolve organically as the new regime accumulates evidence in the metric's window. Prefer self-healing systems over manual-rebaseline systems. Avoids the dismissal anti-pattern (close-the-alert-with-rationale) by making the system robust to its own correct evolution. +type: feedback +--- + +# Self-healing metrics on regime change — factory design principle + +## The rule (Aaron verbatim 2026-04-28T19:09Z) + +> *"the metric self-heals. i love self healing"* +> *"sounds like a good thing to remember"* + +Captured as substrate per the explicit log-in-substrate signal. + +## The pattern + +A **self-healing metric** is one where: + +1. The underlying system is **correctly designed** (the metric's + target is actually being achieved by the new regime). +2. The metric has a **rolling window** (e.g. "recent 30 commits", + "last 7 days", "trailing 90-day error rate") rather than a + permanent counter. +3. After a **regime transition** (a fix lands, a path-gate becomes + active, a new tool is added, a process is corrected), the metric + **automatically improves** as the new regime's evidence accumulates + in the rolling window — without any manual rebaseline, dismissal, + or alert-suppression. + +## The example that named the rule + +**Scorecard SASTID 28/30** (2026-04-28): +- Pre-path-gate commits in Scorecard's "recent 30 merged PRs" + window had no SAST signal because the path-gate hadn't been + active yet. +- Path-gate landed via PR #651, became active for subsequent + PRs, emits empty SARIF on conditional skip. +- Scorecard's window naturally rolls forward; pre-path-gate + commits exit the window; post-path-gate commits enter; the + metric heals to 30/30 over time without intervention. + +The wrong-shape response was my initial dismissal-with-rationale +("won't fix; we DO run SAST"). That would have closed the alert +this scan but masked the natural self-healing dynamic + would have +required re-dismissal on every Scorecard scan. + +The right-shape response: **verify the system is correctly designed, +then let the metric heal**. Aaron's affirmation of the self-healing +property over the dismissal-with-rationale shows the deeper preference. + +## Why prefer self-healing systems + +- **No manual maintenance overhead.** Dismissal-with-rationale + requires re-dismissal each time the metric re-fires. Self-healing + requires zero ongoing work. +- **The metric itself stays useful.** A dismissed alert is a + **hidden** signal — future maintainers don't see it. A + self-heal-pending metric is a **visible** signal that's + expected to clear. +- **It validates the underlying fix.** Watching the metric heal + is empirical evidence the fix actually works. Dismissal + bypasses that validation. +- **It composes with rolling-window analytics generally.** + Self-healing thinking transfers to any rolling metric: + test-flakiness rate, build-time percentile, alert-volume, + response-time SLOs, etc. When the underlying system is + correctly designed, the rolling-window will reflect the + improvement on its own. + +## When the metric is NOT self-healing (red flag) + +If the metric **doesn't improve over time** despite the underlying +system being "correctly designed": + +- The system might NOT actually be correctly designed (the + speculation about "we DO run SAST" needs verification — exactly + what Aaron's "did you fix what it was complaining about?" + question forced). +- The metric's window might be longer than the cadence of new + evidence (e.g., a 90-day window won't heal in 7 days). +- The metric's threshold might be tighter than the system can + achieve in steady-state (a known design-vs-metric mismatch + needing real fix, not dismissal). + +If the metric isn't healing, **the absence of healing IS the +signal**. Don't dismiss it; investigate what's actually still +broken. + +## The discipline + +When encountering a metric-firing alert: + +1. **First: verify the underlying system is correctly designed.** + Don't dismiss before checking. Don't speculate-without-evidence + about whether the system is right. (Per + `feedback_speculation_leads_investigation_not_defines_root_cause_aaron_2026_04_28.md`.) +2. **If the system is correctly designed**: predict the heal-rate + based on the metric's window + the cadence of new evidence. + Watch for the heal. Document the prediction so future-Otto can + verify. +3. **If the system is NOT correctly designed**: fix it (don't + dismiss). The metric was telling the truth; dismissal would + have hidden the gap. +4. **If the metric is firing but heal is slower than the metric's + ceremony cost**: consider whether to fix-now-instead-of-wait + (e.g., the SAST alert was gating PR #661, so wait-for-heal + was unblocking a queue; in other contexts, the right call is + different). + +## Composes with + +- `feedback_emit_empty_security_result_on_conditional_skip_ci_maturity_pattern_aaron_2026_04_28.md` + — the concrete instance where this rule applied. Together they + form the discipline: design the system to emit-empty-on-skip + (CI maturity); let the rolling metric self-heal (factory + philosophy). +- `feedback_speculation_leads_investigation_not_defines_root_cause_aaron_2026_04_28.md` + — verifying the system before predicting heal requires the + speculation-vs-evidence discipline. Don't predict heal on a + speculation about the system. +- `feedback_destructive_git_op_5_pre_flight_disciplines_codex_gemini_2026_04_28.md` + — discipline 2 (timestamp-newer is weak evidence): metric-watching + IS evidence-gathering across rolling windows. The disciplines + compose. +- `feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md` + — Otto-355 is the wait-and-investigate cadence; this rule adds + "wait for self-heal when the underlying system is correct". + +## What this is NOT + +- **NOT a license to ignore alerts.** When the system is broken, + dismissal-via-self-heal-claim is the same anti-pattern as + dismissal-with-rationale. Verify the system FIRST. +- **NOT applicable to permanent-counter metrics.** "Total open + P0 alerts ever" doesn't roll forward; only fixing them resolves + the count. The self-heal pattern is specifically for + rolling-window metrics. +- **NOT applicable when alert-cost > heal-time.** If the alert is + blocking a queue, the wait-for-heal call competes with + fix-now. Pick based on cost-of-wait.