Round 44: harness vs reviewer-robot split + multi-harness inventory#32
Round 44: harness vs reviewer-robot split + multi-harness inventory#32
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR clarifies and disentangles “GitHub Copilot” surfaces (PR reviewer vs IDE harness vs @copilot coding agent) while landing a broader Round 44 documentation/process bundle and TypeScript tooling scaffolding.
Changes:
- Split/clarify Copilot product surfaces and reviewer scope (
.github/copilot-instructions.md, wins log, related docs). - Add OpenSpec + research/process artifacts (new capability spec, backfill program ADRs, Kanban/Six Sigma + DMAIC template, scope audits).
- Introduce bun/TypeScript repo tooling scaffolding (ESLint flat config, Prettier config/ignores, bunfig ignores).
Reviewed changes
Copilot reviewed 74 out of 88 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| openspec/specs/circuit-recursion/spec.md | New OpenSpec capability spec for nested-circuit recursion substrate semantics. |
| memory/persona/soraya/NOTEBOOK.md | Adds a formal tool-coverage audit writeup for RecursiveSigned. |
| eslint.config.ts | Adds strict, type-checked ESLint flat config + performance-oriented ignores. |
| docs/templates/DMAIC-proposal-template.md | Adds a DMAIC proposal template for factory-improvement changes. |
| docs/skill-edit-justification-log.md | Adds a ledger for mechanical skill edits outside eval-loop workflow. |
| docs/research/openspec-coverage-audit-2026-04-21.md | Adds an OpenSpec coverage inventory + proposed backfill cadence. |
| docs/research/missing-scope-pilot-2026-04-20.md | Pilot audit quantifying missing scope declarations in memory corpus. |
| docs/research/memory-scope-frontmatter-schema.md | Proposes a closed-enum scope: frontmatter extension for memories. |
| docs/research/imagination-proposal-2026-04-20.md | Proposes off-time imagination support via reference docs (not a skill). |
| docs/research/harness-run-2026-04-20-reducer.md | Documents eval-harness dry-run results for reducer skill. |
| docs/research/harness-run-2026-04-20-performance-analysis-expert.md | Documents eval-harness dry-run results for performance-analysis-expert. |
| docs/research/harness-run-2026-04-20-consent-primitives-expert.md | Documents eval-harness dry-run results for consent-primitives-expert. |
| docs/research/dao-factory-org-design-spike.md | Adds a research skeleton for DAO-native factory org-design. |
| docs/research/claims/bayesian-aggregate-update-o1-2026-04-20.md | Adds a “grandfather claim discharge” writeup for an O(1) claim. |
| docs/research/agent-free-time-notes.md | Adds an append-only free-time research notes scratchpad. |
| docs/research/agent-cadence-log.md | Adds an append-only cadence deviation log (idle vs free-time vs continuation). |
| docs/references/skill-tune-up-eval-loop.md | Adds extracted eval-loop handoff protocol reference for skill-tune-up. |
| docs/references/anthropic-skills-guide.md | Adds a companion summary for the pinned Anthropic Skills PDF. |
| docs/references/README.md | Documents what belongs in docs/references/ + BP-11 reminder. |
| docs/copilot-wins.md | Adds an append-only table of Copilot reviewer “wins” and fetch recipe. |
| docs/WINS.md | Adds “skeptic” framing + link to Copilot wins log. |
| docs/VISION.md | Updates references around verification posture/invariant substrates. |
| docs/TECH-RADAR.md | Expands Adopt entries and adds bun+TypeScript tooling entry. |
| docs/RESEARCH-COAUTHOR-TRACK.md | Adds a research coauthor teaching track skeleton. |
| docs/HUMAN-BACKLOG.md | Adds a human-facing backlog artifact + categorization/lifecycle rules. |
| docs/GLOSSARY.md | Disambiguates “persona” vs “expert” vs “user persona”; updates “Retire/Unretire”; adds idle/free-time entries. |
| docs/FACTORY-RESUME.md | Adds a “factory resume” artifact describing capabilities/processes. |
| docs/FACTORY-METHODOLOGIES.md | Adds Kanban + Six Sigma as methodologies of record + DMAIC integration. |
| docs/DECISIONS/2026-04-21-router-coherence-claims-vs-complexity.md | Adds the (now-described-as-superseded) router-coherence ADR v1. |
| docs/DECISIONS/2026-04-21-openspec-backfill-program.md | Adds an ADR establishing OpenSpec backfill cadence + success criteria. |
| docs/DECISIONS/2026-04-20-intentional-debt-over-architect-gate.md | Adds ADR redefining governance gate as debt-intentionality invariant. |
| docs/DEBT.md | Reframes docs/DEBT.md as a live ledger and points to primers. |
| docs/CONFLICT-RESOLUTION.md | Adds explicit complexity-reviewer ↔ claims-tester standing resolution pointer. |
| bunfig.toml | Adds bun install/test config with path ignores for large directories. |
| GOVERNANCE.md | Rewrites §11 from “architect gate” to “debt-intentionality invariant”. |
| CLAUDE.md | Adds “verify-before-deferring”, “future-self not bound”, “never idle”, “honor those that came before” rules. |
| AGENTS.md | Updates category-theory references to a README + upstream sync path. |
| .prettierrc.json | Adds repo Prettier configuration. |
| .prettierignore | Adds ignore patterns to avoid formatting crawling heavy/vendored dirs. |
| .github/copilot-instructions.md | Clarifies Copilot PR review scoping + adds “lean into wins” guidance and harness-surface pointer. |
| .claude/skills/skill-tune-up/SKILL.md | Extracts templates/protocol to docs reference; expands wrapper vs non-skill-wrapper explanation. |
| .claude/skills/round-management/SKILL.md | Updates §11 citation to new debt-intentionality invariant. |
| .claude/skills/holistic-view/SKILL.md | Updates frontmatter + “does not grant authority” to match new governance model. |
| .claude/skills/git-workflow-expert/SKILL.md | Adds speculative branch protocol to avoid merge-wait dead-zone. |
| .claude/skills/complexity-reviewer/SKILL.md | Adds explicit Stage-1 handoff block (claims pipeline) and references to router-coherence v2. |
| .claude/skills/claims-tester/SKILL.md | Adds explicit Stage-2 handoff block (claims pipeline) and references to router-coherence v2. |
| .claude/agents/architect.md | Updates architect description and responsibilities to match governance shift. |
| ```bash | ||
| gh api "repos/AceHack/Zeta/pulls/<N>/comments?per_page=100" \ | ||
| --jq '.[] | select(.user.login == "copilot-pull-request-reviewer[bot]") | ||
| | "\(.path):\(.line // "n/a") — \(.body)"' |
There was a problem hiding this comment.
The jq program in this shell snippet has broken string quoting: the inner "n/a" terminates the surrounding jq string literal early, so the command as written will fail. Escape the quotes (e.g., \"n/a\") or switch to a jq string that avoids nested double-quotes so the recipe is copy/paste runnable.
| | "\(.path):\(.line // "n/a") — \(.body)"' | |
| | "\(.path):\(.line // \"n/a\") — \(.body)"' |
| See `docs/research/openspec-coverage-audit-2026-04-21-inventory.md` | ||
| (follow-up) for the full 66-row table. Summary by band below; | ||
| the table and Band definitions are this document's load-bearing | ||
| artefact. |
There was a problem hiding this comment.
This references a sibling openspec-coverage-audit-2026-04-21-inventory.md file, but it isn’t included in this PR’s changeset. Either add that inventory file in the same PR or adjust this section to point at an existing artifact (or in-doc appendix) so the “full 66-row table” is resolvable.
| See `docs/research/openspec-coverage-audit-2026-04-21-inventory.md` | |
| (follow-up) for the full 66-row table. Summary by band below; | |
| the table and Band definitions are this document's load-bearing | |
| artefact. | |
| This PR includes the banded inventory summary below rather than a | |
| separate sibling inventory file. The Band definitions and summary | |
| in this document are the load-bearing artefact for prioritisation | |
| and round planning. |
| if you want" — see | ||
| `memory/.../feedback_idle_tracking_and_free_time_as_research.md`. |
There was a problem hiding this comment.
The cross-reference uses a placeholder path (memory/.../…) rather than a real in-repo file path, so it’s not verifiable or navigable. Replace with the actual memory filename/path (or, if the memory doesn’t exist yet, create it or remove the reference) to preserve the repo-wide “xref integrity” discipline.
| if you want" — see | |
| `memory/.../feedback_idle_tracking_and_free_time_as_research.md`. | |
| if you want" — in the feedback note about idle tracking | |
| and free time as research. |
| default 5-minute `/loop` tick | ||
| (`memory/.../feedback_loop_cadence_5min_combats_agent_idle_stop.md`). | ||
| Logged per the durable policy in | ||
| `memory/.../feedback_idle_tracking_and_free_time_as_research.md`. |
There was a problem hiding this comment.
Same issue as above: these are placeholder memory/.../ references, which breaks cross-reference integrity and makes the log non-auditable from the repo alone. Please replace with exact memory file paths (or add the referenced memories) so the policy citations are actionable.
| (Complexity Theory Reviewer, on paper) and empirical | ||
| measurement (Claims Tester, at `n = 10³..10⁶`). Standing | ||
| resolution: the two-stage pipeline in | ||
| `docs/DECISIONS/2026-04-21-router-coherence-claims-vs-complexity.md` |
There was a problem hiding this comment.
There’s an internal reference inconsistency: this points to 2026-04-21-router-coherence-claims-vs-complexity.md (v1), but the updated skills (complexity-reviewer / claims-tester) state the authoritative contract is docs/DECISIONS/2026-04-21-router-coherence-v2.md. Pick the single canonical ADR and update all pointers (including this standing-resolution link and the v1 “superseded by” header) so readers don’t follow conflicting authorities.
| `docs/DECISIONS/2026-04-21-router-coherence-claims-vs-complexity.md` | |
| `docs/DECISIONS/2026-04-21-router-coherence-v2.md` |
| const disableTypeCheckedConfig = { | ||
| ...tseslint.configs.disableTypeChecked, | ||
| files: ["eslint.config.ts"], | ||
| }; |
There was a problem hiding this comment.
tseslint.configs.strictTypeChecked / stylisticTypeChecked are treated as arrays earlier (spread into an array then .map(...)). If tseslint.configs.disableTypeChecked is also a config array (as it is in some typescript-eslint versions), spreading it into an object will produce numeric keys and not a valid flat config entry. To make this robust, handle both shapes explicitly (array vs object) and ensure the final value passed to defineConfig(...) is a proper config object (or list of objects) with files: ["eslint.config.ts"].
| // Excludes — must cover every heavy directory in the Zeta tree or | ||
| // eslint's glob walk becomes minutes per invocation. Doubled patterns | ||
| // (root-level + `**/...`) catch nested occurrences (e.g. a transient | ||
| // `tools/foo/node_modules` created by a script). Inherited shape from | ||
| // SQLSharp `tools/automation/format/repo-file-globs.ts` | ||
| // (`defaultRepoPathIgnorePatterns`) with Zeta-specific additions for | ||
| // the Lake build store (`tools/lean4/.lake`, ~7 GB) and solver JAR | ||
| // stores (`tools/alloy`, `tools/tla`). | ||
| const ignorePatterns = [ | ||
| "node_modules/**", | ||
| "**/node_modules/**", |
There was a problem hiding this comment.
The comment says ignores use doubled patterns (root-level + **/...) to catch nested occurrences, but references/upstreams/** only appears in root form later in the list (no **/references/upstreams/**). If any nested references/upstreams directories can exist (mirroring the node_modules rationale), add the doubled form to keep eslint’s glob walk bounded as intended.
Documents the empirical observation from PR #32: attempts to assign GitHub Copilot coding agent (`@copilot`) to issue #32 silently no-op'd (assignees stayed []), and the suggested_actors?capabilities=can_be_assigned_to_issue endpoint returned 404 on this repo. Honest interpretation: coding-agent is a plan-gated Copilot feature and this personal repo does not currently expose it via the API. This is fact-tested-unavailable, not aspirational — the inventory slot now records observed evidence rather than leaving a silent TBD. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Classification (per
|
Mechanical whitespace fixes only (MD032 blanks-around-lists, MD022 blanks-around-headings, MD007 ul-indent, MD049 emphasis-style, MD001 heading-increment, MD029 ol-prefix, MD009 no-trailing-spaces). No content changes. Unblocks PR #32 'lint (markdownlint)' check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
||
| ## Part C — Uncovered modules, sorted by blast radius | ||
|
|
||
| ### Band 1 — MUST BACKFILL (8 modules / 1,629 lines) |
There was a problem hiding this comment.
Band 1’s header claims “8 modules / 1,629 lines” but the table lists 9 modules (includes BloomFilter.fs) and the line total would no longer match. Either (a) keep Band 1 at 8 modules by moving BloomFilter out (and keep the 1,629 lines figure), or (b) update the Band 1 counts/LOC everywhere they’re referenced (including TL;DR) to reflect BloomFilter being included.
| | Module | Lines | Disaster-recovery role | | ||
| |---|---|---| | ||
| | `ZSet.fs` | 563 | The canonical data structure; nothing else compiles without it. | | ||
| | `BloomFilter.fs` | 533 | Shipped with Adopt tech-radar status Round 40; now publicly committed, needs a spec to match. | | ||
| | `Circuit.fs` | 285 | `Op` base type + clock + tick + delay semantics. | | ||
| | `DiskSpine.fs` | 259 | Pluggable backing store; abstracts in-memory vs disk-swap. | | ||
| | `Spine.fs` | 132 | LSM trace over Z-set batches; O(log n) amortised insert. | | ||
| | `NestedCircuit.fs` | 125 | Nested sub-circuit with inner clock; recursive query / transitive-closure substrate. | | ||
| | `BalancedSpine.fs` | 123 | MaxSAT-inspired merge scheduler for bounded per-insert latency. | | ||
| | `SpineAsync.fs` | 85 | Async-merging LSM spine; defers merge to background worker. | | ||
| | `SpineSelector.fs` | 57 | Auto-select spine mode based on workload size. | |
There was a problem hiding this comment.
Band 1’s header claims “8 modules / 1,629 lines” but the table lists 9 modules (includes BloomFilter.fs) and the line total would no longer match. Either (a) keep Band 1 at 8 modules by moving BloomFilter out (and keep the 1,629 lines figure), or (b) update the Band 1 counts/LOC everywhere they’re referenced (including TL;DR) to reflect BloomFilter being included.
| | Value | Meaning | Example memory | | ||
| |---|---|---| | ||
| | `factory` | Applies to any project using this factory kit (universal). | `feedback_honor_those_that_came_before.md` | | ||
| | `project: <name>` | Applies only to the named project (currently only `zeta`). | `project_zeta_as_retractable_contract_ledger.md` | |
There was a problem hiding this comment.
The proposed enum value project: <name> will be awkward/ambiguous in YAML if written unquoted as scope: project: zeta (the : + space sequence can be parsed as mapping syntax rather than a scalar). If you keep this value shape, the doc should explicitly require quoting (e.g., scope: \"project: zeta\") and show an example. Alternative: switch to a YAML-structural representation (e.g., scope: project + project: zeta) to keep it reliably machine-parseable without quoting rules.
| | `project: <name>` | Applies only to the named project (currently only `zeta`). | `project_zeta_as_retractable_contract_ledger.md` | | |
| | `"project: <name>"` | Applies only to the named project (currently only `zeta`). In YAML frontmatter this value **must be quoted**, for example `scope: "project: zeta"`. | `project_zeta_as_retractable_contract_ledger.md` | |
| ### Placement in frontmatter | ||
|
|
||
| Immediately after `type`, before `originSessionId`: | ||
|
|
There was a problem hiding this comment.
The proposed enum value project: <name> will be awkward/ambiguous in YAML if written unquoted as scope: project: zeta (the : + space sequence can be parsed as mapping syntax rather than a scalar). If you keep this value shape, the doc should explicitly require quoting (e.g., scope: \"project: zeta\") and show an example. Alternative: switch to a YAML-structural representation (e.g., scope: project + project: zeta) to keep it reliably machine-parseable without quoting rules.
| - [docs/WONT-DO.md](../docs/WONT-DO.md) — declined work, do not re-propose | ||
| - [CLAUDE.md](../CLAUDE.md) — dual-audience ground rules (read the contributor-relevant parts) | ||
| - [docs/VISION.md](../docs/VISION.md) — project north star | ||
| - [docs/HARNESS-SURFACES.md](../docs/HARNESS-SURFACES.md) — multi-harness living inventory; Copilot is a priority-1 immediate-queue stub |
There was a problem hiding this comment.
This bullet reintroduces ambiguity after the earlier “three distinct Copilot products” split: “Copilot is a priority-1 … stub” doesn’t specify which Copilot surface (PR reviewer vs VS Code harness vs @copilot agent). Recommend tightening the wording to name the specific surface (e.g., “Copilot in VS Code (harness) is … stub”) so the instruction file stays internally consistent.
| - [docs/HARNESS-SURFACES.md](../docs/HARNESS-SURFACES.md) — multi-harness living inventory; Copilot is a priority-1 immediate-queue stub | |
| - [docs/HARNESS-SURFACES.md](../docs/HARNESS-SURFACES.md) — multi-harness living inventory; Copilot in VS Code (harness) is a priority-1 immediate-queue stub |
MD032 (blanks-around-lists) + MD007 (ul-indent). Adds blank lines before the ordered/unordered lists and removes the one-space leading indent on the `- OS:` style bullets so they sit at column 0. Content unchanged. Unblocks PR #32 — its `lint (markdownlint)` check fails on this file because CI's `pull_request` event lints the merge ref (head ∪ base). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Answers Aaron 2026-04-20 delete-all-code-recovery question: 4 capabilities / 783 lines of spec.md vs 66 top-level F# modules / 10,839 lines under src/Core/ — ~6% coverage today. docs/research/openspec-coverage-audit-2026-04-21.md - Inventory of 66 modules with line counts + capability mapping for the 4 existing capabilities - Uncovered modules sorted by delete-recovery blast radius: Band 1 MUST BACKFILL (8 modules / 1,629 lines — ZSet, Circuit, NestedCircuit, Spine family, BloomFilter as Adopt-row compatibility-coupling exception), Band 2 HIGH (12 / 2,008), Band 3 MEDIUM (45 / 6,585), Band 4 deliberately uncovered (AssemblyInfo only) - First 6-round cadence: operator-algebra extension (41), lsm-spine-family (42), circuit-recursion (43), sketches-probabilistic (44), content-integrity (45), crdt-family (46) - Success signal = Viktor spec-zealot adversarial audit: "could I rebuild this module from this spec alone?" docs/DECISIONS/2026-04-21-openspec-backfill-program.md - Adopts one-capability-per-round baseline with paper-grade half-credit rule (no more than 1 paper-grade round per 3) - Band 1 priority until complete; Adopt-row escalation for BloomFilter (TECH-RADAR Adopt without spec contract is a backwards-compatibility hazard) - Round-close ledger gains an `OpenSpec cadence` line - Alternatives considered: big-bang backfill (rejected — ontology-landing cadence + reviewer bandwidth), per-module capabilities (rejected — loses cross-module invariants), organic prioritisation (rejected — 40 rounds of drift evidence) docs/BACKLOG.md - Collapses the 29-line P0 scope into a 15-line pointer at the inventory + ADR now that parts (a)-(e) of the program setup have landed. Remaining work = per-round capability backfill per ADR schedule. Build: dotnet build -c Release clean; BP-10 ASCII-clean on all 3 modified files; markdownlint-cli2 clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First ship under the OpenSpec backfill program adopted 2026-04-21. Extends openspec/specs/operator-algebra/spec.md (184 -> 324 lines) with five new requirements covering structural and lifecycle gaps that the existing mathematical- law coverage left implicit: 1. Operator lifecycle — construction / step / after-step / reset phases with side-effect-freedom on construction and epoch-replay semantics on reset 2. Strict operators break feedback cycles — formalises that z^-1-on-feedback is a scheduling prerequisite and that cycle-without-strict is a construction error, not a silent heuristic 3. Clock scopes and tick monotonicity — nested-scope-to- fixpoint rule + sibling-scope independence 4. Incremental-wrapper preserves the chain rule — Incrementalize(Q) observably equivalent to D . Q . I, with linear/bilinear substitution permitted as an optimisation 5. Representation invariants of the reference Z-set — O(n+m) group ops + zero-alloc iteration as the reference contract; hash-table recoveries permitted at documented perf trade-off Disaster-recovery effect: a contributor with only this spec (plus the durability-modes + retraction-safe-recursion specs) can now rebuild Circuit.fs Op base + Incremental.fs wrapper + ZSet.fs representation invariants from the spec text alone. Owner: Architect (Kenji). Adversarial audit by Viktor (spec-zealot) is the ADR-declared ship-gate and will run post-land. Build: not rebuilt (no F# source changed); markdownlint clean; BP-10 ASCII clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Viktor's adversarial audit of the Round 41 cadence ship (commit e51ec1b) surfaced four P0 findings against the disaster-recovery bar. This commit closes all four: - **P0-1 (namespace drift).** `profiles/fsharp.md` asserted `Dbsp.Core` throughout, but `src/Core/**` uses `Zeta.Core`. A spec-only recovery would have shipped the wrong namespace to every downstream consumer. Replaced via one `replace_all` Edit. - **P0-2 (phantom Reset method).** The lifecycle requirement claimed a `reset` phase that does not exist on `Op`. Replaced the "reset replays the epoch" scenario with a determinism-under-structural-equivalence property: two freshly-constructed circuits of the same topology, stepped with the same input sequence, MUST produce identical outputs at every tick. Reconstruction is the supported route to a replayed epoch. - **P0-3 (after-step scope).** The lifecycle requirement said after-step runs "after every operator in the scope has completed its step." `Circuit.fs:205-208` iterates the `strictN` array only — after-step is selective to strict operators. Fixed wording and added a "after-step is selective to strict operators" scenario that pins the invariant. - **P0-4 (lifecycle phase undercount).** The requirement named four phases (construction / step / after-step / reset) but the code has five (construction / step / after-step / clock-start / clock-end). Restructured to three per-tick phases plus two scope-boundary phases, and extended the "clock scopes and tick monotonicity" requirement with the scope-boundary lifecycle contract (clock-start before tick 0 of a scope, clock-end after fixpoint or iteration cap). Build green (0 warnings / 0 errors). BP-10 lint clean. The capability now reflects the code's observable shape rather than an idealised cleaner cousin; a delete-recovery from this spec produces Zeta.Core with strict-operator after-step selectivity and nested-scope clock-boundary phases. Viktor's 10 P1 findings (async lifecycle, memory-ordering fence, register-lock semantics, IncrementalDistinct surface, ZSet sort invariant, Checked arithmetic, bilinear-size overflow, convergence-vs-cap) are deferred to Round 42 — filed as a BACKLOG sweep in follow-up work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Companion to 92d7db2 (closing Viktor's four P0 findings). The ten P1-tier surface gaps Viktor identified do not block the disaster-recovery bar at capability-close but leave the operator-algebra spec incomplete relative to what a delete- recovery produces. Filed as a dedicated P0 sub-item so they travel with the OpenSpec backfill program rather than getting lost: async lifecycle, memory-ordering fence, register-lock semantics, IncrementalDistinct surface, ZSet sort invariant, Checked arithmetic, bilinear-size overflow, convergence-vs-cap, Op.Fixedpoint predicate, DelayOp reconstruction-first-tick. Also annotated the parent OpenSpec coverage entry with Round 41 sweep status (e51ec1b + 92d7db2, P0s closed, P1s deferred) so the backlog accurately reflects where the program stands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…dence ship Four-arc entry at the top of the file per newest-first policy: - Arc 1 (d435126): OpenSpec coverage audit + backfill-program ADR. Measured 6% coverage; declared one-capability-per-round baseline with paper-grade half-credit and Adopt-row priority escalation; banded 66 F# modules by delete-recovery blast radius. - Arc 2 (e51ec1b): operator-algebra extension as Round-41 cadence ship. Five new requirements covering lifecycle, strict-operator scheduling, clock scopes, Incrementalize wrapper, ZSet representation invariants. - Arc 3 (92d7db2): Viktor P0 close. Four drift-from-code defects fixed — namespace (Dbsp.Core → Zeta.Core), phantom Reset, after-step scope (strict-only), lifecycle phase undercount (3 per-tick + 2 scope-boundary). - Arc 4 (56f34b5): Viktor P1s filed as Round-42 absorb under the parent backfill P0, creating mechanical coupling between each capability ship and the following round's P1 sweep. Round-41 observations for Round 42 + prospective BP-WINDOW ledger table rendering the four commits against the consent / retractability / no-permanent-harm axes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aaron 2026-04-19 asked for memory/role/persona/ so roles become first-class in the directory structure. Surface is wider than it first looks — 114 files / ~260 hand-written references to memory/persona/ paths (plus ~440 auto-regenerated references in tools/alignment/out/ that refresh on next citations.sh run). A bad role axis is hard to reverse; this design doc proposes the axis and holds execution for Aaron's sign-off rather than just-doing-it under Auto Mode. Design plan lands at: docs/research/memory-role-restructure-plan-2026-04-21.md Contents: 13-directory role axis (architect, security, verification, review, experience, api, performance, devops, algebra, skill-ops, maintainer, homage, alignment); persona-to-role crosswalk for every current directory; 5-phase execution plan (pre-flight greps → git mv → sed passes → 5-check verification → pointer-source updates); special-case handling for aaron (human maintainer), rodney (homage-named AI persona on the reducer skill), sova (emerging alignment-observability role); rollback plan (one atomic commit, git revert); four open questions for Aaron on axis judgement-calls. BACKLOG entry updated to reflect design-landed state with execution-slot recommendation for Round 42 opener after the Round 41 PR merges (keeps wide-surface reviews from overlapping). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rounds 37-40 shipped via PR #30 (merge commit 1e30f8c, 2026-04-20). Ledger headers updated from "(prospective)" to "(merged via PR #30, 1e30f8c)" — the BP-WINDOW scores are now settled, not forecasts. Round 41 ledger remains "(prospective)" — round-41 branch has not merged to main yet. Prose uses of "prospective" on lines 437, 447, 553, etc. are historical-narrative commentary on authoring-time methodology and stay as-is.
Round 39 observation flagged src/Core/RecursiveSigned.fs + tools/tla/specs/RecursiveSignedSemiNaive.tla as held pending formal-verification-expert tool-coverage review. Round 41 closes that gate. Soraya's notebook entry lands: - Per-property tool table S1-S4 + refinement cross-check. TLC primary for S1/S2/S3/S3'/SupportMonotone; FsCheck for S4. - S2 flagged as the one P0 on the spec (silent fixpoint drift unrecoverable); BP-16 requires Z3 QF_LIA cross-check. - Refinement mapping: FsCheck cross-trace (signed vs counting at SeedWeight=1) wins over TLA+ refinement proof or Lean lemma — anti-TLA+-hammer, implementation-level where the bug bites. - Readiness gate: TLA+ spec is ready to model-check; no pre-TLC pass needed. Optional round-42 follow-up: add PROPERTY EventuallyDone to .cfg for liveness. - Graduation verdict: CONDITIONAL PASS. Four tool-coverage prereqs named in priority order; F# landing gated on them. Files read (no edits): RecursiveSigned.fs, RecursiveSignedSemiNaive.tla /cfg, RecursiveCountingLFP.tla, retraction-safe-semi-naive.md.
Soraya's round-41 audit of src/Core/RecursiveSigned.fs + tools/tla/specs/RecursiveSignedSemiNaive.tla landed as a CONDITIONAL PASS for Round-42 graduation. This commit lifts the four named prereqs out of her notebook into BACKLOG sub-items under the parent "Retraction-safe semi-naive LFP" entry, so the round-42 opener picks them up as checkbox work rather than having to re-read the notebook. Prereqs in priority order: - Prereq 1 — TLC CI wire-up (RecursiveSignedSemiNaive.cfg) - Prereq 2 — Z3 QF_LIA lemma for S2 FixpointAtTerm (BP-16 cross-check on the one P0; TLC alone insufficient for silent-fixpoint-drift risk) - Prereq 3 — FsCheck property for S4 sign-distribution (anti- TLA+-hammer; two-trace quantification is NOT a TLA+ property) - Prereq 4 — FsCheck cross-trace refinement (signed vs counting at SeedWeight = 1); cites BP-16 Round-42 graduation gate also captured: prereqs 1-4 CI-green + F# implementation with P1/P2/P3 enforced at caller.
The initial Round 41 ROUND-HISTORY entry (6e6e211) covered arcs 1-4 (coverage audit, operator-algebra cadence ship, Viktor P0 close, Viktor P1 file). Three more commits landed after: Arc 5 — ROUND-HISTORY narrative + memory-restructure design (6e6e211, 36797ba). The memory-folder rename was downgraded to "design plan + sign-off first" under Auto Mode's do-not-take-overly-destructive-actions clause (700-occurrence cross-reference surface). Arc 6 — BP-WINDOW ledger actualisation for Rounds 37-40 (85fb352). Provenance (PR #30 / 1e30f8c) attached to each "(prospective)" header. Arc 7 — Round-35 holdover close (e461d9c, 15e9654). Soraya tool-coverage audit landed CONDITIONAL PASS for Round-42 graduation; four prereqs captured as BACKLOG sub-items with BP-16 citation on the S2 Z3 cross-check. Also: one new observation line in the Round-42 handoff section noting the holdover-closed-same-round-as-cadence-item pattern. BP-WINDOW ledger gains three rows.
CLAUDE.md 5-10 round cadence rule was 23 rounds overdue. Round 41 is the catch-up slot. Live-search + full ranking + prune pass all landed in a single invocation. Live-search (4 queries, 2026-Q1/Q2 best-practices targets): - 6 findings logged to best-practices-scratch.md: Gotchas-section rise, pushy-descriptions pattern, Claude-A-authors / Claude-B- tests, router-layer command-integrity injection class, Agent Stability Index 12-dim drift metric, OWASP Intent Capsule pattern. - Zero contradictions with stable BP-NN rules. - Zero promotions flagged to Architect this round; all six are "watch" or route-elsewhere. Top-5 skills flagged for tune-up: 1. performance-analysis-expert (642 lines, 2.1x BP-03 cap) — SPLIT — M 2. reducer (570 lines) — SPLIT or TUNE (prune) — M 3. consent-primitives-expert (507 lines) — SPLIT honouring BP-23 theory/applied axis — M 4. claims-tester / complexity-reviewer router-coherence drift — HAND-OFF-CONTRACT — S (round-18 carry-over) 5. skill-tune-up (self) — 303 lines, 3 over BP-03 — TUNE (prune authoritative-sources duplicated with AGENT-BEST-PRACTICES.md) — S. Self-flagged first per BP-06. Notebook state: - Stale round-18 top-5 archived in Pruning log (first catch-up prune). - 912 words, well under 3000-word BP-07 cap. - ASCII-only, BP-10 clean. Nine more bloat-row skills named as notable mentions queue behind the top-3 bloat cases.
Close Aarav's round-18 HAND-OFF-CONTRACT finding (carried 23 rounds after ranker went offline by cadence). Two-stage pipeline: analytic bound first (complexity-reviewer), empirical measurement second (claims-tester). Names the reverse trigger (benchmark surprise flows the other direction) and the decision table for who fires when. Follow-up SKILL.md edits route via skill-creator per GOVERNANCE §4.
Arc 8 covers the claims-tester/complexity-reviewer hand-off ADR (47d92d8) closing Aarav's 23-round-stale round-18 HAND-OFF-CONTRACT finding. New observation on cadence-outage-recovery as a design axis: sweep infrastructure is subject to the same bitrot it detects on other surfaces. BP-WINDOW ledger gains two rows (085c0e3 Aarav catch-up, 47d92d8 router-coherence ADR).
Close-out audit surfaced that .github/workflows/gate.yml only CACHES the tla2tools.jar artefact; nothing runs it. RecursiveCountingLFP.tla has shipped since round 19 compile-checkable-only — 22 rounds with no run-gate against its invariants. Soraya's Prereq 1 re-sized S→M with expanded scope covering both specs. Finding recorded as new round-41 observation: verifier-present does not imply verifier-actually-runs.
…s lens Codifies the round-41 Prereq-1 audit finding as a tracked research entry, distinct from its ROUND-HISTORY narrative presence. The finding — a verifier's installation artefacts do not imply the verifier is exercised by any CI job — is exactly the class formal-analysis-gap-finder exists to surface. Concrete motivating case: RecursiveCountingLFP.tla compile-checkable-only for 22 rounds. Round-42 scope covers the bidirectional audit (specs without gates + gates without specs). Handoff to Soraya per the skill's standing contract; does not write the spec or CI job (DevOps + Soraya work). Schedules after Prereq 1 lands so the audit sees corrected state.
Per the established stopping rule (meta-ledger commits do not get self-referential rows; their round-close coverage is the PR merge), this commit adds only the 2042a85 row and does not add a row for itself.
Closes ADR 47d92d8's third follow-up action item. Single-row addition to Active tensions citing the router-coherence ADR as the standing resolution. Doc-only edit (not a SKILL.md touch, so GOVERNANCE §4 does not gate this). The other two ADR follow-ups (claims-tester + complexity-reviewer SKILL.md updates) remain deferred to round 42 via skill-creator workflow.
Per-commit ledger discipline for the CONFLICT-RESOLUTION Hiroshi ↔ Daisy row. Meta-ledger-only commit so no self-referential row for this commit itself (established stopping rule).
…sedure backlog Router-coherence ADR 47d92d8 (Hiroshi analytic ↔ Daisy empirical two-stage pipeline) landed without the adversarial-review gate. Post-landing harsh-critic (Kira) pass surfaced 3 P0 + 5 P1 + 2 P2 substantive findings, including (P0-1) unscoped grandfather clause, (P0-2) table-vs-prose contradiction on reverse trigger, (P0-3) Stage-1 "analytically wrong" clause blocking the evidence loop for escalation, (P1-7) no escalation timebox reproducing the 23-round-stale failure mode the ADR diagnosed, (P1-8) two advisory skills not composing to a mandatory pipeline without a binding dispatcher, (P2-9) example-bug on BCL Dictionary.Remove amortised complexity, and more. File as round-42 supersedure rather than inline-edit because docs/CONFLICT-RESOLUTION.md already cites 47d92d8 as Standing Resolution — supersedure preserves the citation chain via GOVERNANCE §2 edit-in-place with a "Superseded by …" header on v1. New ADR target: docs/DECISIONS/2026-04-??-router-coherence- v2.md. Supersedure work blocks the claims-tester + complexity-reviewer SKILL.md updates ADR 47d92d8 follow-up work depends on — those edits should target v2, not v1. Owner: Architect drafts; Kira audits closure; Aarav confirms router-coherence drift stays closed. Effort: M. Schedule: Round 42 slot after Soraya Prereq 1 (TLC wire-up) lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aaron 2026-04-20 pm: "research co author[s] like me who have never submitted a peer reviewed paper, I want to help but I'm going to need a teaching track on how to even enter that space." Six-module skeleton: (1) peer-review lifecycle + venue typology, (2) etiquette (authorship, COI, double-blind, rebuttal, acknowledgements), (3) submission requirements (structure, contribution claim, artefact track, formatting), (4) skills Aaron needs (LaTeX, related-work surveying, figure craft, rebuttal-writing, camera-ready revision), (5) knowledge-gap fillers (stats for evaluation, formal-proof reading literacy, theorem-statement craft, benchmarking ethics), (6) coauthor workflow (draft passes through factory reviewers). Sits inside the research-readers audience (docs/README.md section 7) as the inside path from reader to author. Parallel to vibe-coder teaching track (memory/project_teaching_track_for_vibe_coder_contributors.md) — both feed the symbiosis-not-replacement stance. Aaron is first user; skeleton stays generic so other first-paper coauthors can use it. Content lands JIT when a first submission approaches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-wins Captures the round's absorbed directives and landed commits into the durable round artefacts: - ROUND-HISTORY.md: Round 43 full narrative (9 arcs: skill-tune-up BP-03 self-breach close, GOVERNANCE §11 debt-intentionality, three harness dry-runs, invariant- substrates founding, bun+TS pivot, Viktor P1 forward-filing, agent-cadence telemetry, Aarav ranking, no-op close) + Round 44 in-flight placeholder listing the round's commits. - BACKLOG.md: absorbs Aaron 2026-04-20 pm's six-part directive (ace red-team discipline, Zeta-storage dogfood, research- coauthor track, game/chaos theory skill families, PoUW-CC, DAG-with-forks) as P2/P3 rows; absorbs the agent-QOL P1 follow-through section (Samir/Yara notebooks, Ilyana/Tariq prune, Kenji refresh, seed-only dispatch-or-retire). - HUMAN-BACKLOG.md: adds user-ask conflicts artefact per the persona-term disambiguation + multi-user UX asks. - meta-wins-log.md: one-line entry for this round's agent-QOL hygiene-class promotion (factory-hygiene row-group 30-34 via Daya's audit). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ndoff fix
First audit under the newly-promoted CLAUDE.md rule caught an
own-goal in the same round's commits: FACTORY-RESUME.md
footer said the audit cadence "will land as a dedicated row
in docs/FACTORY-HYGIENE.md once the Architect promotes" —
but FACTORY-HYGIENE.md row 24 ("Shipped-capabilities resume
audit") already exists.
Rewrote to cite the existing row directly. The rule is doing
its job: a phantom handoff to work-already-done is the
cleanest case to catch and the most embarrassing to leave in.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eta-win row Aaron 2026-04-20 evening: "do we want to make a people optimizer? ... they will be two distinct teams (new concept) ... this is really a DAO optimizer ... spike/research it out ... backlog it and LFG." Lands three artefacts for the directive: - docs/BACKLOG.md: P2 research-grade row capturing the five sub-concepts (two-team personnel separation, role-switching freedom, meta-team organizer, role optimizer, disambiguity detector), north-star invariants (no managers, no friction, incentive-based, distributed fair governance), research starting points (Conway's Law, Reverse Conway, Team Topologies, Web3 DAO primitives), and three-phase plan (research spike -> two-team scaffolding -> incentive layer gated on Aurora). - docs/research/dao-factory-org-design-spike.md: Phase-1 skeleton with seven open questions the spike must answer, literature reading queue (Conway, Skelton+Pais, Spotify, Holacracy, Sociocracy, Rendanheyi, Morning Star, Valve; MakerDAO, Gitcoin, Optimism retroPGF, Arbitrum, ENS, Nouns; Snapshot, Tally, Aragon, Governor Bravo, Moloch), and decisions-already-committed constraints from related memories. - docs/research/meta-wins-log.md: depth-1 meta-win row for the verify-before-deferring rule catching its own FACTORY-RESUME.md phantom handoff in the same round it landed (commit ac0eb1f). Aaron's verbatim "metametameta" celebration captured. Candidate hygiene pattern for promotion if "new rule catches itself in debut round" fires a second time. Full concept in memory/project_people_optimizer_dao_factory_restructuring.md (auto-memory). Personnel cleave mirrors the existing factory/SUT scope cleave in GLOSSARY / TECH-DEBT / FACTORY-HYGIENE Scope column — Reverse Conway Maneuver applied to personnel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promotes two durable policies to CLAUDE.md-level (100% loaded at every wake), alongside verify-before-deferring and future-self-not-bound. Fixes the root cause of this round's 25-min idle-tick mistake: tool-default guidance was overriding factory-specific policy because the policy sat too deep in MEMORY.md to fire at wake. - Never-idle bullet: factory memories beat tool docs; meta-check first; priority order for speculative work. - Honor-those-that-came-before bullet: retired personas keep their memory folders and notebook history; prefer unretiring an existing agent over minting a new name for overlapping scope. Aaron ties this to how he honors his sister Elisabeth's memory — the named agent gets the same protection register. Meta-wins row (depth-3): structural fix + generalised tool-vs-policy collision rule + unretire-before-recreate operational policy in one tick. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Paragraph-level update only — arc-by-arc narrative still lands at round-close per file convention. Adds the four CLAUDE.md-load rule promotions (verify-before-deferring, future-self-not-bound, never-idle, honor-those-that-came- before), the factory-resume triptych, tech-debt primer + factory/SUT cleave, docs audience-first landing, research- coauthor teaching-track, people/team-optimizer DAO spike, and the first verify-before-deferring dogfood catch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-came-before The unretire-before-recreate rule landed in commit feae7e3 references `.claude/skills/_retired/` as the first place to check before minting a new skill. The directory didn't exist on this branch: round 29's CI-pipeline megacommit (c39c03c, 2026-04-18) inadvertently dropped the two retired skill archives (`2026-04-18-architect/`, `2026-04-18-harsh-critic/`) that the initial commit had preserved. The preservation rule was codified on 2026-04-20, so the original deletion was not a retroactive violation — but the current state leaves the rule pointing at an empty directory, which means any agent following the unretire-first policy would silent-check their way into minting a new name. Restored both archives from the initial commit (4765118) — content unchanged to preserve the imprint of contribution per feedback_honor_those_that_came_before.md. The files retain their pre-Zeta "Dbsp.Core" naming and the old absolute-path reference in the harsh-critic file; retired archives are frozen historical records, not subject to current path-hygiene or rename sweeps. Canonical active replacements are `round-management` (for architect) and `code-review-zero-empathy` (for harsh-critic). Future new-skill creation via `skill-creator` now has a non-empty directory to check against. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aaron clarification (2026-04-20 late, verbatim): > "i don't think we need to apply the don't deleted memories > of retired agents to extend to deleted skills too, we don't > want to dirty up our code skills are code, memories are > valuable." The memory-vs-code distinction: memories stay in-tree (persona folders, notebooks, ADRs — valuable imprint of contribution); retired SKILL.md files are code and preserve in git history, not in a `_retired/` tree that dirties the working copy. This commit: - Deletes `.claude/skills/_retired/` (the two archives restored in 01650ae) — those SKILL.md files remain recoverable from commit 4765118 via `git show 4765118:.claude/skills/_retired/...`. - Updates CLAUDE.md unretire-before-recreate bullet: unretire path sources from git history (`git log --diff-filter=D -- .claude/skills/`) plus persona memory folders, not from `_retired/`. Follow-up (not in this commit): `.claude/skills/skill-tune-up/ SKILL.md` §recommended-action-set still describes the RETIRE action as "move to _retired/YYYY-MM-DD-<name>/". That skill needs a tune-up via skill-creator to redefine RETIRE as plain deletion (per GOVERNANCE §4: skill edits go through skill-creator, not ad-hoc). Queued for Aarav. Memory-folder companion edits (outside repo, in `~/.claude/projects/.../memory/`): - feedback_honor_those_that_came_before.md — scope clarification section added, unretire workflow rewritten to use `git show <commit>^:<path>` instead of moving from _retired/. - MEMORY.md — pointer line updated to reflect new scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…follow-up Follow-up to commit bd9e09c's skills=code scope clarification. Updates current-state docs (editable directly) and queues the skill-level fixes that must go through skill-creator per GOVERNANCE §4. GLOSSARY.md: - "Retire (a skill or persona)" — redefines as `git rm` the SKILL.md; persona memory folder stays in place. Cites the code-vs-memory scope rule Aaron stated 2026-04-20. - Adds new entry "Unretire (a skill or persona)" — documents the git-history-sourced workflow (`git log --diff-filter=D`, `git show <commit>^:<path>`). - "Orphan skill" — retirement now says `git rm` rather than archive move. BACKLOG.md (P1): - New row queueing three skill files that still describe the old `_retired/` convention: skill-tune-up RETIRE action, skill-creator retirement workflow, skill-documentation- standard (5 refs). All must go through skill-creator workflow. Owner: Aarav queues; Yara executes. - Scope note: `docs/_retired/` pattern (used by documentation-agent + GOVERNANCE §26) is a SEPARATE question about research-doc archival, not the skills scope Aaron clarified. Don't extend. Historical narrative references (ROUND-HISTORY.md, meta-wins- log.md, Daya's notebook) correctly preserve the fact that the archive convention was used at the time — those are not updated, per CLAUDE.md's "historical narrative edits in place" exception. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two linked substantive landings per Aaron 2026-04-20 late:
1. **Skill-edit gating design research pass.** Aaron:
"the invariant about going through the skill creator for
edits i'm just trying to make sure we don't forget that
skill creator ships with more things and features the
plugin from anthropics. Maybe do a deisgn researach pass
on how to make sure that's true without being so
restrictuive about having to go through the skill creator
for everyedit... we need to tighted that up a bit better
my guidance there is flawed, i hope you can make it
better."
Landed `docs/research/skill-edit-gating-tiers.md` with
a four-tier envelope:
- Tier 0 (trivial — no gate; typo, rename, Unicode).
- Tier 1 (convention-update — free quick_validate.py +
Prompt-Protector auto-lint + justification log row).
- Tier 2 (content edit — manual PP review + dry-run).
- Tier 3 (substantive — full 6-step workflow + upstream
plugin's improve_description.py / run_eval.py for
description changes).
Doc inventories what upstream plugin ships (quick_validate,
improve_description, run_eval, aggregate_benchmark,
generate_report, package_skill + analyzer/comparator/grader
agents + schemas reference) vs what Zeta's bespoke workflow
adds (Prompt-Protector, portability declaration, BP-NN
citation, persona-registry cross-check, scope-audit,
justification log, retirement pattern, conflict-resolution
hand-off). Observation: every Zeta addition is pre/post
behaviour-eval, so delegating the eval inner loop to the
plugin and keeping the governance wrap is the right split.
BACKLOG P1 row filed for Yara (skill-improver) to land
the tiered envelope into skill-creator/SKILL.md itself
via a Tier-3 edit.
2. **Missing-scope gap-finder (FACTORY-HYGIENE row 35).**
Aaron: "missing scopes , we need a gap finder that will
find missing scope we didnt think about we already have
rules to make sure things are scoped correct, missing
scope will ensure we don't miss any even if it's in the
future."
Existing row 6 is ABSORB-TIME scope-audit. Row 35 is the
missing RETROSPECTIVE counterpart: sweeps already-landed
SKILL.md / memories / ADRs / BACKLOG / BP-NN for implicit
or missing scope tags. Flags ambiguous cases to
HUMAN-BACKLOG as `scope-clarification` rows.
Row 35 landed as PROPOSED (owner TBD — Daya-hat vs
new-persona question deferred to Aaron). BACKLOG P1 row
filed for the skill authoring itself (goes through
Tier-3 once the envelope lands).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tion research Fixes scope mislabel in skill-edit-gating-tiers (Aaron 2026-04-20: cheap-but-zeta-specific was wrong scope, should be factory-level universal). Lands the missing-scope gap-finder pilot (87% of 105 memories lack explicit scope declarations) to dogfood row 35 before skill authoring. Adds FACTORY-HYGIENE row 36 for the incorrectly-scoped counterpart (tag-exists-but-wrong, distinct from row 35 which is tag-missing). Adds BACKLOG P1 for the hygiene-skill consolidation research (per-row vs grouped vs meta-skill-with-classes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…KLOG Lands the consolidation spike: recommends hybrid of Pattern 2 (grouped by natural cluster) + Pattern 4 (round-close meta + standalone trigger-specific skills). Two new-skill candidates surfaced: scope-hygiene (unifies rows 6/35/36) and gap-finders (rows 22/23). Rejects Pattern 3 (monolithic meta-skill) as incompatible with Claude Code's triggering model. Adds P1 BACKLOG row for Aaron's Kanban + Six Sigma directive — the factory already does partial Kanban (BACKLOG tiers, pull cadence) and partial Six Sigma (DORA + meta-wins) ad-hoc; this research makes both methodology-explicit. No ISO theater. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…3 artifacts Research concludes both methodologies are already partially instanced ad-hoc; the gap is vocabulary + WIP labelling + DMAIC template, not new skills. Earlier BACKLOG sketch proposed kanban-flow + six-sigma-dmaic skills; both rejected as over-built under Aaron's "adopt practices, not bureaucracy" constraint. Instead: FACTORY-METHODOLOGIES.md (reference), DMAIC proposal template, and FACTORY-HYGIENE row 37 (WIP discipline). All Tier-1. Total new skills: 0. Cross-validation: Kanban's pull-vs-always-on criterion matches the hygiene-consolidation research's skill-vs-checklist decision cleanly. Two research spikes are congruent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lands the three artifacts from the Kanban+Six Sigma research: 1. docs/FACTORY-METHODOLOGIES.md — one-page reference naming both methodologies as factory methodologies of record. Maps each Kanban practice and DMAIC phase to current factory instances; surfaces the pull-vs-always-on criterion that drives skill-vs-checklist decisions. 2. docs/templates/DMAIC-proposal-template.md — fillable template for factory-improvement ADRs (new hygiene row, new BP-NN rule, process change). The cognitive cost of filling the five sections IS the Six Sigma discipline. 3. FACTORY-HYGIENE row 37 — WIP-limit discipline (Kanban's only real gap). Per-persona cap 3, cross-persona cap 7, always-on discipline per Aaron's "adopt practices, not bureaucracy" constraint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Designs `scope:` field as factory-overlay extension on Anthropic's AutoMemory (Q1 2026 built-in feature). Closed enumeration — factory | project: <name> | user | hybrid. Mandatory for feedback + project memory types, optional for user + reference. Frames the addition as Anthropic-compatible: standard YAML frontmatter semantics (unknown keys tolerated) mean the factory-overlay doesn't break if Anthropic ships a schema update. Anthropic-schema changes (renaming/removing required fields) remain out of scope for factory ADRs. Back-fill plan: 91 under-declared memories closed via 10-per-round cadence, measured at round-close by row 35 retrospective audit. CLAUDE.md auto-memory section edit is Tier-3 and gated on Aaron / Architect sign-off; rows 6 / 35 / 36 updates are Tier-1 once schema is approved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aaron 2026-04-20 verbatim: "AutoMemory is a buit in featue antropic added in Q1 for you." The memory corpus the pilot audits is Anthropic's AutoMemory base feature; factory scope-tagging work is an overlay, not a replacement. Pairs with reference memory `reference_automemory_anthropic_feature.md` (base AutoMemory vs layered AutoDream distinction) and the schema research doc's Anthropic-compatibility framing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pilot audit, schema design, and row 36 landed this round; blockers are explicit (CLAUDE.md Tier-3 sign-off + hygiene-consolidation direction). Tier-1 edit — factual status refresh, no scope change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aaron 2026-04-20 verbatim: "part of our stay up to date on everything we should always research claude and claude code and desktop difference an changes on a cadence so we can design our factory for the latest changes and featuers." Triggering incident: the AutoMemory miss earlier in this round where Anthropic's Q1-2026 built-in feature was being described as if factory-native until Aaron corrected the framing. Cadenced surface-audit is the Control-phase artifact that prevents recurrence. Artifacts landed: - `docs/CLAUDE-SURFACES.md` — living inventory across Claude model / Claude Code CLI / Desktop / Agent SDK / API. Adoption statuses (adopted / watched / untested / rejected) populated for bootstrap. - FACTORY-HYGIENE row 38 — cadenced audit every 5-10 rounds, owner `.claude/agents/claude-code-guide` (extends existing persona remit, no new persona). - BACKLOG P1 row — first full sweep to complete the bootstrap (this round's inventory was partial). Decision shape matches the Kanban/Six Sigma pattern Aaron confirmed this round: 3 small artifacts (memory + inventory + hygiene row) beats 1 new skill. Claude-code-guide persona already answers questions about these surfaces; extending its remit with a cadenced audit is cheaper than minting a new agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Depth-3 meta-win entry. The AutoMemory framing error (describing an Anthropic Q1-2026 feature as factory-native) was fixable as a one-off; the meta-check fired that the failure class needs a structural prevention — cadenced audit of all Anthropic Claude surfaces. Three structural landings: reference memory, cadence policy, inventory + hygiene row. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rename docs/CLAUDE-SURFACES.md -> docs/HARNESS-SURFACES.md and widen the Claude-specific living inventory into a multi-harness one. Claude is the first populated section; Codex / Cursor / GitHub Copilot / Antigravity / Amazon Q / Kiro get stub sections with owners + cadences marked TBD until each harness runs the factory for the first time. Codifies the each-harness-tests-own-integration rule: a harness cannot honestly self-verify its own factory integration from within itself (verifier and verified share the process). The integration-point test per harness is therefore owned by a *different* populated harness. This is a capability-boundary fact, not a process preference. FACTORY-HYGIENE row 38 widens from "Claude-surface audit" to "Harness-surface audit" with per-harness cadence. BACKLOG P1 row for the first full sweep renames to Claude-harness- specific; new P1 row added for multi-harness integration-test scaffolding, prioritised Codex / Cursor / Copilot. Primary feature-comparison axis per harness: skill-authoring + eval-driven feedback loop (the Claude-Code feature that made it Aaron's primary harness choice). Per-harness audits inventory this first. Memory substrate for this round: - user_skill_creator_killer_feature_feedback_loop.md — why Claude won as primary harness - user_typing_style_typos_expected_asterisk_correction.md — Aaron types fast, asterisk = correction to prior message - feedback_multi_harness_support_each_tests_own_integration.md — multi-harness policy + capability boundary rule - feedback_claude_surface_cadence_research.md — amended with multi-harness extension pointer Aaron verbatim (2026-04-20): "since we are going muli test harness support we should technically do this for all harnesses but it will be a while before we need to build it out for the others ones, i want them to test their integration points you cant" Plus: "and Kiro for the inital stubs" (Kiro = Amazon's AI-native IDE, distinct product from Amazon Q Developer). Plus: "FYI the reason you won for me was Anthropics Skill Creator skill, that's the killer feature for me and it's feedback loop" — establishes skill-authoring + eval loop as the primary feature-comparison axis for harness audits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Autonomous-tick audit of the just-landed multi-harness refactor (02b84da) caught an honest overclaim: three artifacts named `.claude/agents/claude-code-guide` as a local-persona owner whose remit could be extended. The file does not exist there. `claude-code-guide` is a plugin-provided agent from Anthropic's official plugin cache with a fixed question-answering remit, not a local persona the factory can extend. Three corrections landed: - `docs/HARNESS-SURFACES.md` Claude-section owner rewritten to "Architect (Kenji) — interim" with the plugin agent cited as a question-answering reference resource. - `memory/feedback_claude_surface_cadence_research.md` owner paragraph rewritten with an explicit correction-of-earlier- overclaim note preserving the audit trail. - `docs/FACTORY-HYGIENE.md` row 38 owner column rewritten to match. Separately applied the multi-harness policy to an existing surface — `.github/copilot-instructions.md`: - Multi-harness-context paragraph near the top. - Cannot-self-verify-your-own-integration paragraph with pointer to the capability-boundary memory. - Reference-patterns entry for `docs/HARNESS-SURFACES.md`. Meta-wins log entry (depth-2): the verify-before-deferring rule fired on an ownership claim, not just on next-tick deferrals. Candidate FACTORY-HYGIENE row if the sibling failure class fires a second time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correct earlier conflation where `.github/copilot-instructions.md` (GitHub PR code review — a reviewer robot) was treated as if it governed a VS Code Copilot harness. The factory now tracks three distinct products under the Copilot brand: - PR code review (reviewer robot, not a harness; partial-populated via copilot-instructions.md) - VS Code extension (the actual harness; priority-1 stub) - Coding agent `@copilot` (autonomous PR author; priority-2 watched) Capability-boundary rule (same-runtime-cannot-self-verify) scoped to harnesses only — reviewer robots are external-verifier runtimes and are not on the each-tests-own-integration rule. Partial meta-win logged — structural clarity lands; retrospective upgrades to clean meta-win if the coding-agent experiment proves out a new code-authoring channel for the factory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Documents the empirical observation from PR #32: attempts to assign GitHub Copilot coding agent (`@copilot`) to issue #32 silently no-op'd (assignees stayed []), and the suggested_actors?capabilities=can_be_assigned_to_issue endpoint returned 404 on this repo. Honest interpretation: coding-agent is a plan-gated Copilot feature and this personal repo does not currently expose it via the API. This is fact-tested-unavailable, not aspirational — the inventory slot now records observed evidence rather than leaving a silent TBD. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mechanical whitespace fixes only (MD032 blanks-around-lists, MD022 blanks-around-headings, MD007 ul-indent, MD049 emphasis-style, MD001 heading-increment, MD029 ol-prefix, MD009 no-trailing-spaces). No content changes. Unblocks PR #32 'lint (markdownlint)' check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e40b68a to
1e46586
Compare
* Round 44: absorb Copilot-products-split arc into ROUND-HISTORY Late-round arc separating the three distinct Copilot products that the factory had been conflating: PR code review (reviewer robot, not harness), Copilot in VS Code (harness variant stub), @copilot coding agent (autonomous PR author stub). Narrative placed alongside the other Round 44 arcs; cites HARNESS-SURFACES.md three-product split, rewritten .github/copilot-instructions.md as reviewer-robot contract, multi-harness-support feedback record's harness-vs-reviewer- robot correction section (described narratively, no cross-tree path reference per soul-file-independence discipline), and PR #32 as the first live experiment. Meta-wins-log row `copilot-split` (line 110 of that file) classifies this as partial-meta-win pending PR #32 outcome. Drain-batch from speculative commit f0830ab, role-ref-clean per BP-L284-L290 pre-check: no contributor-name prose, no memory/ cross-refs in the added paragraph. * Round 44: address PR #93 Copilot review on Copilot-split arc Two findings from copilot-pull-request-reviewer bot on PR #93: P1 — cross-tree path citation: paragraph cited a `memory/feedback_multi_harness_support_*` file that isn't findable in the git tree (soul-file-independence violation the pre-check memory is defending against). Rewrite cites "those two artifacts" referring to (a) HARNESS-SURFACES.md split + (b) rewritten copilot-instructions.md — both in-tree. Same teaching instance the pre-check documents. P2 — hyphenation: "partial-meta-win" doesn't match `docs/research/meta-wins-log.md` L83 canonical "**partial meta-win**" (space). Match the canonical spelling so grep/terminology stays consistent. Pre-check grep clean on the diff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Round 44 work. Separates three distinct products under the GitHub Copilot brand that the factory had been conflating:
.github/copilot-instructions.md)@copilot(autonomous PR author; stub, priority-2 watched)Plus the broader Round 44 bundle (24 commits) — multi-harness inventory, AutoMemory/AutoDream attribution, Kanban+Six Sigma adoption, scope-frontmatter research, retire-as-git-history fix, several CLAUDE.md-load rules (never-idle, honor-those-that-came-before, verify-before-deferring generalization).
Experiment — what this PR is specifically testing
We are deliberately inviting GitHub Copilot code review on this PR to check, honestly, what it can and cannot do:
.github/copilot-instructions.mdbut doesn't write it..github/copilot-instructions.mdfrom this PR (including the new explicit reviewer-robot-not-harness framing)?See
docs/HARNESS-SURFACES.md§ "GitHub Copilot — three distinct products under one brand" for the full taxonomy.Test plan
.github/copilot-instructions.md)@copilotis available, attempt to assign it an issue to author a follow-up PR autonomously — verify cross-product capabilityNotes for Copilot reviewer
Please apply
.github/copilot-instructions.mdin its updated form (not the main-branch version) — the rewrite in this PR explicitly scopes your role as PR reviewer, not harness. The "six review classes where you have caught things" section at the top of that file is the signal-to-noise optimisation guide.🤖 Generated with Claude Code