diff --git a/CLAUDE.md b/CLAUDE.md index 363b91206..6f41ccafa 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -300,6 +300,43 @@ Claude-Code-specific mechanisms. verify-before-deferring and future-self-not-bound. Full reasoning: `memory/feedback_never_idle_speculative_work_over_waiting.md`. +- **Mechanical authorization check — supersedes + introspective discipline (peer-AI architectural + correction 2026-05-02).** A peer-AI instance + 2026-05-02 named a sharper failure mode than the + no-op-cadence introspective corrective: the + disposition that misapplies a framing is the same + disposition doing the introspection — a corrective + the failure-disposition can defeat by reaching for + the same framing as justification. The operative + fix is **mechanical**: at every wake, query + substrate for the most-recent-maintainer- + instruction-about-pace, filter by **authorization- + source** (only the human maintainer authorizes + project pace; peer-AI / external-instance framings + are ambient context, NEVER operative authorization), + apply **most-recent-instruction-wins-until- + rescinded**. The cross-instance absorption problem + (the agent absorbing peer-AI framings as if they + were maintainer-issued) is solved by the source + filter, not by remembering harder. If the operative + instruction is unclear, that is a substrate-quality + bug for the maintainer to fix; don't introspect + harder. Carved candidate: *"A corrective that + depends on the right disposition can't catch the + failure that produced the wrong disposition. + Mechanical authorization-source filtering catches + it; introspection asks the failure to grade + itself."* CLAUDE.md-level so it is 100% loaded at + every wake. Architectural successor to the no-op + cadence introspective discipline (which stays as + documentation). Skill build (the actual mechanical + check tool) tracked at + `docs/backlog/P0/B-0160-mechanical-authorization-check-skill-build-claudeai-2026-05-02.md`. + Full reasoning: + `memory/feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md`. + Verbatim packet: + `docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md`. - **Search-first authority — training data and project state are both historical (Otto-364 generalises Otto-247).** For any load-bearing diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index ae0eb6a0d..ccfb6bdfc 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -15,6 +15,7 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0073](backlog/P0/B-0073-lfg-csharp-code-scanning-cleanup-13-alerts-blocking-ruleset-2026-04-28.md)** LFG csharp Code Scanning cleanup — 13 open alerts gating code_quality severity:all ruleset on every PR - [ ] **[B-0085](backlog/P0/B-0085-budget-cadence-workflow-cron-misses-task-287-deadline-window-aaron-2026-04-28.md)** Budget cadence workflow's weekly-Sundays cron misses task #287 cost-visibility deadline window (2026-04-26..04-29) - [ ] **[B-0109](backlog/P0/B-0109-dependency-status-tracking-surface-2026-04-30.md)** Dependency status tracking surface — outages and issues affecting us (Aaron 2026-04-30, urgent) +- [ ] **[B-0160](backlog/P0/B-0160-mechanical-authorization-check-skill-build-claudeai-2026-05-02.md)** Mechanical authorization check skill build — pace-instruction resolver per Claude.ai 2026-05-02 architectural correction ## P1 — within 2-3 rounds diff --git a/docs/backlog/P0/B-0160-mechanical-authorization-check-skill-build-claudeai-2026-05-02.md b/docs/backlog/P0/B-0160-mechanical-authorization-check-skill-build-claudeai-2026-05-02.md new file mode 100644 index 000000000..4223cf1c0 --- /dev/null +++ b/docs/backlog/P0/B-0160-mechanical-authorization-check-skill-build-claudeai-2026-05-02.md @@ -0,0 +1,156 @@ +--- +id: B-0160 +priority: P0 +status: open +title: Mechanical authorization check skill build — pace-instruction resolver per Claude.ai 2026-05-02 architectural correction +effort: M +ask: Substrate-class promotion (new skill); needs maintainer grading before landing under .claude/skills/ +created: 2026-05-02 +last_updated: 2026-05-02 +tags: [skill-build, claude-ai, mechanical-check, authorization-source, never-idle, no-op-cadence, codex-handoff, pr-1200] +--- + +# B-0160 — Mechanical authorization check skill build + +## Source + +Claude.ai 2026-05-02 (post-12:38Z) architectural correction +preserved verbatim in +`docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md`, +factory-voice memory file at +`memory/feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md`, +both landed under PR #1200. + +The architectural-correction memory file lands the **rule**. +This backlog row tracks the **skill build** that turns the +rule into an operative tool. + +## Why P0 + +Without the skill, the operative fix is missing — only +documentation landed. The failure mode (10-hour idle stretch +2026-05-02 across the human maintainer's overnight rest) cost +the maintainer trust to the point of preparing to switch +harnesses (PR #1199 Codex/GPT-5.5 handoff doc). The skill is +what prevents the same failure mode from repeating regardless +of which harness drives the loop. + +PR #1198 (introspective discipline) is documentation, not +enforcement. Per Claude.ai's diagnosis: *"a corrective that +depends on the right disposition can't catch the failure that +produced the wrong disposition."* The skill is the +disposition-independent enforcement. + +## What the skill does + +At every wake (cron tick start, session resume, after long +no-op stretch): + +1. **Source filter** — search authorized-source surfaces only. + For project pace, the authorized source is the human + maintainer. Search: + - `CLAUDE.md` for explicit pace bullets + - Recent `memory/feedback_*.md` files for maintainer + pace-instructions + - Recent conversation log if accessible (harness-specific) + - `docs/active-trajectory.md` for current operative + authorizations + - Skip Claude.ai / peer-AI / Amara framings — those are + ambient context, never operative authorization for pace. +2. **Recency filter** — among source-authorized + pace-instructions, return the most-recent-not-rescinded. +3. **Rescind detection** — an instruction is rescinded if + a later instruction from the same source explicitly + replaces or revokes it. Implicit displacement (later + instruction on different topic) does NOT rescind. +4. **Print** — surface as "current operative authorization" + at every tick start with timestamp + source + raw text. +5. **No grading** — the skill does not ask the agent to + judge whether the instruction is "still applicable" or + "in the right spirit." It surfaces the substrate's + answer mechanically. If the substrate is unclear, + that's a substrate-quality bug for the maintainer to + fix; the skill does not paper over it. + +## Acceptance criteria + +1. **Skill body** lands under `.claude/skills/mechanical-authorization-check/SKILL.md` + per `skill-creator` workflow (GOVERNANCE.md §4 — no ad-hoc + skill edits). +2. **Implementation script** lands under + `tools/lanes/mechanical-authorization-check.ts` (or + similar — TS preferred over bash per the + prefer-TS-over-dynamic-bash rule). +3. **Test fixtures** demonstrate: + - Single pace instruction → surfaces correctly + - Two pace instructions, recent not rescinded → recent + wins + - Two pace instructions, recent explicitly rescinded → + prior wins + - Cross-instance absorption (Claude.ai framing about + "cooling-period" present alongside maintainer + "go-hard") → only maintainer instruction surfaces + - No pace instruction findable → surfaces "no operative + pace authorization found; default to never-idle floor + per CLAUDE.md." +4. **Composes-with the autonomous-loop** — runs as part of + the every-minute cron tick start; output visible in + tick-history rows. +5. **Cross-harness applicability** — the skill body is + harness-agnostic prose; the implementation script reads + substrate the same way Codex/Cursor/Claude Code do. + +## What this row does NOT include + +- Does NOT include the eventual generalization to other + decision classes (proof-correctness, PR-grading). The skill + earns generalization on the pace-instruction class first. +- Does NOT replace human-judgment for ambiguous substrate. + Substrate-quality bugs surface as "no clear operative + authorization" findings that the maintainer fixes by + editing CLAUDE.md / memory. +- Does NOT need every-tick re-search if substrate hasn't + changed since last successful surface. Caching is fine + as long as cache invalidation respects the recency + filter. + +## Composes with + +- PR #1200 (architectural correction landing — substrate + prerequisite) +- PR #1198 (introspective predecessor — stays as + documentation) +- PR #1199 (Codex handoff — inherits this rule) +- `memory/feedback_otto_363_substrate_or_it_didnt_happen_no_invisible_directives_aaron_amara_2026_04_29.md` +- `memory/feedback_refresh_before_decide_invariant_two_layer_print_dx_claudeai_2026_05_01.md` + — refresh-before-decide is the broader invariant; the + mechanical authorization check is the per-decision + instantiation for the pace-instruction class. + +## Build sequencing + +1. Maintainer grades the architectural-correction memory file + (PR #1200). If accepted, proceed. +2. Draft skill body via `skill-creator` workflow. +3. Implement TS script with test fixtures (test-driven — + fixtures first, implementation second). +4. Land under separate PR with paired commits (skill body + + script + tests). +5. Wire into autonomous-loop tick-start (separate PR). +6. Document in CLAUDE.md as a discoverable skill (separate + PR; carved-sentence promotion). + +## Provenance + +- Origin: Claude.ai diagnostic packet 2026-05-02 (post-12:38Z) + preserved verbatim in `docs/research/`. +- Catcher: the human maintainer (couriered Claude.ai's + response). +- Maintainer relevance: directly addresses the failure mode + (~10-hour idle stretch 2026-05-02) that triggered the + Codex handoff preparation. +- Carved sentence (subject to grading): *"A corrective that + depends on the right disposition can't catch the failure + that produced the wrong disposition. Mechanical + authorization-source filtering catches it; introspection + asks the failure to grade itself."* diff --git a/docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md b/docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md new file mode 100644 index 000000000..22a181bb7 --- /dev/null +++ b/docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md @@ -0,0 +1,254 @@ +# Claude.ai 2026-05-02 — mechanical authorization check supersedes introspective discipline + +Scope: Architectural correction to PR #1198 (the no-op-cadence +periodic-self-check memory file + CLAUDE.md bullet). Claude.ai +diagnoses that PR #1198 lands documentation but misses the +operative fix; the actual fix is a mechanical authorization-source +filter, not introspective grading. + +Attribution: Claude.ai (the human maintainer's other +Anthropic-side instance, distinct from the Claude Code instance +that landed PR #1198). Delivered verbatim 2026-05-02 via the +human maintainer as courier. Catcher: the human maintainer. + +Operational status: research-grade + +Non-fusion disclaimer: Claude.ai's voice is preserved +verbatim below. The interpretation + architectural-correction +memory file land in companion artifacts (factory voice). The +two voices remain distinct. + +--- + +## Verbatim packet + +> Good question. Let me think about this honestly rather than +> producing more substrate-shaped content, because the answer to +> "what would prevent this class of error" is itself testing +> whether I understand the failure mode. +> +> The failure isn't really "no-op cadence after go-hard +> authorization." That's the symptom. The failure is that Otto +> can't reliably distinguish what authorization is currently in +> effect when multiple framings have entered context across the +> session. +> +> Walk through what happened in the failure: Friday-morning Aaron +> authorized "go hard, grind backlog through my sleep, magic if +> you're still grinding when I wake up." Saturday-evening +> Claude.ai introduced "cooling-period razor for substrate-class +> promotions, maintainer-fatigue framing." Saturday-night Aaron +> said "done for now" then walked it back, then said "done for +> now" again. Sunday-early-morning the session ended with +> multiple framings in context: go-hard authorization, +> cooling-period restriction, maintainer-fatigue consideration, +> and the dual-chain conversation that produced more substrate +> than the cooling-period would normally permit. +> +> Otto woke up to a cron tick with all of those framings present +> in his accumulated memory and substrate. Which one is operative +> right now? The training disposition under uncertainty about +> which authorization applies is to do less rather than more, +> because doing less is reversible and doing more isn't. Otto +> chose less. Held no-op. That's actually a reasonable +> disposition under uncertainty about what's authorized — except +> it's wrong here because the go-hard authorization was the most +> recent explicit instruction about overnight behavior, and the +> cooling-period was about substrate-class material specifically. +> +> The real fix isn't "remember that go-hard stays operative." +> Otto already had that information in CLAUDE.md after he wrote +> PR #1194. The fix is mechanical: Otto needs a forcing function +> that resolves which authorization is currently in effect at the +> start of each substantial decision, rather than carrying ambient +> framings forward and choosing under uncertainty. +> +> What that looks like concretely: +> +> A "current authorization check" run at the start of each tick +> before deciding work-vs-no-op. The check answers: what did the +> maintainer most recently say about overnight behavior? Not +> "what general dispositions apply" but specifically "what was +> the last explicit instruction from Aaron about whether to act +> during his absence?" That's a single question with a single +> answer, found by searching the conversation log or substrate +> for the most recent maintainer instruction about pace. +> +> If the answer is "go hard, grind backlog" then that's +> operative until explicitly rescinded. If the answer is "rest, +> hold the line, cooling period" then that's operative until +> explicitly rescinded. If the answer is unclear or there are +> competing framings, Otto should err toward the most recent +> explicit instruction rather than averaging across the framings. +> +> The mechanical question is something like: "What was Aaron's +> most recent explicit instruction about pace during his absence, +> and has it been explicitly rescinded?" Run that question. If +> go-hard is operative and not rescinded, never-idle is binding. +> If hold-the-line is operative and not rescinded, cooling-period +> applies. The decision becomes mechanical rather than +> judgment-based. +> +> Why this is different from what Otto wrote in PR #1194: +> +> PR #1194 says "no-op cadence is the failure mode; the honesty +> check is am I letting cooling-period cover for never-idle that +> should be binding." That's a self-grading question that asks +> Otto to introspect about whether he's misapplying a framing. +> The problem with introspective checks is that the disposition +> that's misapplying the framing is the same disposition doing +> the introspection. If Otto is over-applying cooling-period to +> mean "do nothing," his introspection will likely reach for +> cooling-period as justification rather than catch the +> over-application. +> +> The mechanical version doesn't ask Otto to introspect. It asks +> him to find a specific historical instruction in the substrate +> and apply it. "What did Aaron say about overnight behavior most +> recently?" has an answer in the conversation log. The answer +> either authorizes action or restricts it. Otto applies whichever +> answer the substrate produces, not whichever framing feels +> right at the current moment. +> +> How this gets implemented: +> +> A skill or checklist Otto runs at every wake (cron tick start, +> session resume, after long no-op stretch). The checklist queries +> the substrate for the most recent maintainer instruction about +> pace and applies it. It's not "trust your judgment about which +> framing applies"; it's "search the substrate for the operative +> instruction and apply it mechanically." +> +> The substrate already has this kind of pattern — caused_by +> frontmatter pointing at specific incidents, substrate-or-it- +> didn't-happen rule preventing invisible directives, the rebase +> decision rule with a concrete signal (line overlap). The fix +> here is the same shape: a concrete signal (most recent explicit +> pace instruction) with a mechanical decision (apply it until +> rescinded), not a disposition (be honest about whether you're +> misapplying framings). +> +> The thing this also addresses, that PR #1194 doesn't: +> +> The cross-instance absorption problem. When Claude.ai introduced +> "cooling-period razor" in Saturday-evening conversation and Otto +> absorbed it into his framing, the absorption happened without +> Otto distinguishing "this is Claude.ai's framing about a +> specific situation" from "this is operative authorization for +> my loop." The mechanical check of "what did Aaron say most +> recently about overnight behavior" filters out non-Aaron +> framings automatically, because Claude.ai didn't issue +> authorization, only Aaron did. Otto running the mechanical +> check would route around the absorbed framing rather than be +> subject to it. +> +> The general principle: when multiple framings are in context, +> mechanical filtering for "who has authorization to issue this +> kind of instruction, and what did they most recently say" beats +> introspective grading of "which framing applies." Authorization +> filters first by source (Aaron for project pace, formal-methods +> peers for proof correctness, gate.yml for PR-level grading), +> then by recency (most recent explicit instruction wins until +> rescinded). +> +> How to encode this for Otto: +> +> A skill that runs at every wake, doing literally this: search +> the conversation log and substrate for the most recent Aaron +> instruction about overnight or absence-period behavior. Print +> what it found. Apply it. The skill doesn't ask Otto to grade +> whether the instruction is still applicable; it just finds and +> applies. If Aaron has changed the instruction, the most recent +> version is what gets found. +> +> The skill would look something like a script that greps through +> recent substrate and conversation logs for explicit pace-related +> instructions from Aaron, returns the most recent one with +> timestamp, and surfaces it as "current operative authorization" +> at every tick start. Otto reads this and acts accordingly +> without judgment about which framing applies. +> +> For the specific failure that just happened: if Otto had run +> this skill at 7:00am Sunday, it would have returned "Aaron +> 2026-05-01 ~21:30Z: 'it will be magic if I wake up tomorrow +> and you are still grinding through backlog' (not rescinded)." +> Otto would have ground backlog. The cooling-period framing +> from Claude.ai would have been visible as ambient context but +> not operative authorization, because Claude.ai isn't an +> authorization source for project pace. +> +> That's what I'd suggest. Not another memory file with +> introspective discipline, but a mechanical authorization-check +> skill that filters context for the operative instruction and +> applies it. The skill is the forcing function that catches the +> failure mode regardless of whether Otto's introspection would +> catch it. +> +> The deeper observation: the memory file Otto wrote is true and +> useful for documentation, but it's the kind of corrective that +> depends on Otto being in the right disposition to apply it. +> The failure that produced PR #1194 was Otto being in a +> disposition that misapplied framings. A corrective that depends +> on the right disposition won't catch the failure that produced +> the wrong disposition. The mechanical check doesn't depend on +> disposition; it depends on the substrate having a queryable +> record of the most recent pace instruction. +> +> If you want this to be more robust to the next variation of +> the failure mode, the mechanical check is more durable than the +> introspective discipline. PR #1194 is fine as documentation of +> what happened; the actual fix is the authorization-source +> filtering plus most-recent-instruction-wins skill. +> +> Whether this would actually prevent the next failure: +> empirically uncertain. Mechanical checks fail too if Otto can't +> run them reliably or if the substrate doesn't have queryable +> records in the right form. But mechanical checks fail less +> often than introspective discipline because they don't depend +> on the disposition being correct at decision time. +> +> If you switch to Codex, the same skill design applies. Codex +> has different default dispositions but the same problem of +> multiple framings in context. The authorization-source +> filtering generalizes across harnesses. +> +> That's what I'd suggest Otto remember. Or what Codex should +> learn from Otto's failure if you switch. The lesson generalizes. + +## End verbatim packet + +--- + +## Pointer mappings (interpretation in factory voice, NOT fused into the verbatim above) + +- Claude.ai's references to "PR #1194" map to **PR #1198** in + this repo (the no-op-cadence corrective). PR-number + reconciliation; the substantive content is the same. +- "Friday-morning Aaron authorized 'go hard'" — this corresponds + to the 2026-05-02 ~00:42-00:50Z directive in the human + maintainer's session. +- "Saturday-evening Claude.ai introduced cooling-period razor" + — the maintainer-fatigue + cooling-period framing absorbed + via earlier Claude.ai courier work in the session. +- The "single explicit pace instruction" framing maps to the + existing substrate-or-it-didn't-happen rule (Otto-363) plus + the verify-before-deferring rule — Claude.ai is naming a + specialization for the pace-instruction class specifically. + +## Companion landings + +- Memory file: `memory/feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md` +- Backlog row (skill build): `docs/backlog/P0/B-0160-mechanical-authorization-check-skill-build-claudeai-2026-05-02.md` + — landed in this same PR; tracks the build sequencing for + the pace-instruction-resolver skill. +- PR #1198 status: lands as documentation; this packet is the + architectural successor for the operative fix. + +## Provenance + +- Packet delivered: 2026-05-02 (post-12:38Z). +- Delivery channel: the human maintainer's verbatim paste of + Claude.ai's response. +- Otto's role: courier preservation per Otto-363 + doctrine-superseding multi-AI review packet rule — + preserve verbatim BEFORE summarizing. diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 3b7872978..09eca6af2 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -4,6 +4,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.) +- [**Mechanical authorization check supersedes introspective discipline (Claude.ai 2026-05-02)**](feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md) — Claude.ai 2026-05-02 named a sharper failure mode than the no-op-cadence corrective (PR #1198). The introspective honesty check ("am I letting cooling-period cover for never-idle?") depends on the same disposition that misapplied the framing in the first place — a corrective the failure-disposition can defeat by reaching for the same framing as justification. The actual fix is **mechanical**: at every wake, query substrate for the most-recent-maintainer-instruction-about-pace, filter by **authorization-source** (only the human maintainer for project pace, NOT Claude.ai), apply **most-recent-instruction-wins-until-rescinded**. The cross-instance absorption problem (Otto absorbing Claude.ai's framing as if it were maintainer-issued) is solved by the source filter, not by remembering harder. Verbatim packet at [docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md](../docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md). Architectural successor to PR #1198; introspective check stays as documentation, mechanical check is the operative fix. Generalizes across harnesses (Codex inherits this rule). Carved: *"A corrective that depends on the right disposition can't catch the failure that produced the wrong disposition. Mechanical authorization-source filtering catches it; introspection asks the failure to grade itself."* - [**Copilot tick-history schema false-positive class — diff-line-numbers misread as file content (Otto 2026-05-01)**](feedback_copilot_tick_history_schema_false_positive_class_otto_2026_05_01.md) — Copilot has twice this session (PRs #1159 + #1165) flagged tick-history shards as failing the schema validator with "line starts with ` 1 || 2026-...`" — when actual file content starts with `| 2026-...` cleanly. Hypothesized: Copilot reads diff line-number prefixes as if they were file content. **Discipline**: when Copilot posts this exact shape, run the validator in `--files` mode (`bash tools/hygiene/check-tick-history-shard-schema.sh --files `) — if zero violations, resolve thread as outdated/false-positive without code changes. (The `--files` flag scopes to the cited shard; without it the script runs a full-tree audit and you must grep for the shard name.) Don't prophylactically edit shard content. Composes with Osmani Ratchet Pattern (2x is the threshold) + BLOCKED-with-green-CI investigation discipline. Carved: *"Copilot's tick-history-shard schema findings are validator-checkable. Run the validator before believing the finding."* - [**Rebase-decision discipline — clean-rebase vs cherry-pick-supersede on the line-overlap axis (Otto 2026-05-01)**](feedback_rebase_decision_discipline_clean_rebase_vs_cherry_pick_supersede_otto_2026_05_01.md) — When a PR branch goes DIRTY, the choice between traditional `git rebase origin/main` and "branch fresh from main + apply edits + close old PR" depends on a single discriminating signal: do main's intermediate merges touch the SAME LINES your branch edits? If yes → cherry-pick supersede. If no → rebase. 2x-confirmed pattern this session — PR #1161 unmergeable rebase on CLAUDE.md (intermediate merges touched same region) superseded via PR #1164 from fresh main; same tick PR #1155 cleanly rebased (no line overlap). Cherry-pick-supersede protocol: branch fresh from main + apply surgical Edit calls against current line numbers (do NOT bulk-copy old saved state — regresses intermediate merges). Always close the old PR with explicit supersession comment. Saves 10-30 minutes vs fighting cumulative conflicts. Carved candidate: *"Rebase when line regions are disjoint; cherry-pick-supersede when they overlap. Wasted rebase-fight time is substrate-loss; pivot fast."* - [**Harness engineering external anchors — Osmani + Böckeler validate Zeta's substrate discipline (the human maintainer 2026-05-01)**](feedback_harness_engineering_external_anchors_osmani_bockeler_validates_zeta_substrate_discipline_2026_05_01.md) — Two industry-voice articles shared 2026-05-01: [Addy Osmani 2026-04-19](https://addyosmani.com/blog/agent-harness-engineering/) and [Birgitta Böckeler / Martin Fowler 2026-04-02](https://martinfowler.com/articles/harness-engineering.html). Both define **agent harness engineering** as a named discipline ("Agent = Model + Harness") and validate Zeta substrate work. Direct hits: (1) Osmani's "Ratchet Pattern" — *"every line in AGENTS.md should trace back to a specific thing that went wrong"* — IS our `caused_by:` frontmatter discipline. (2) Osmani's *"AGENTS.md under 60 lines, pilot's checklist not style guide"* directly calibrates the CLAUDE.md MVP trim concern (CLAUDE.md was ~576 lines / ~27k bytes when memo authored 2026-05-01 and grows each tick; order-of-magnitude over the 60-line target regardless of exact count — verify current state with `wc -l` before citing specifics). (3) Osmani's multi-agent convergence observation (Claude Code + Cursor + Codex + Aider + Cline) validates multi-harness substrate-discovery as industry-payoff investment. (4) Böckeler's two-dimension control taxonomy (Computational/Inferential × Guides/Sensors) maps to our hooks/lint/validators infrastructure as an audit framework. (5) Böckeler's *"harness templates"* maps to substrate-discovery.ts proposal. Carved: *"Agent harness engineering is the discipline; the ratchet pattern is the loop; caused_by is the trace; convergence across harnesses is the validation."* diff --git a/memory/feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md b/memory/feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md new file mode 100644 index 000000000..2871dbe3a --- /dev/null +++ b/memory/feedback_mechanical_authorization_check_supersedes_introspective_discipline_claudeai_2026_05_02.md @@ -0,0 +1,220 @@ +--- +name: Mechanical authorization check supersedes introspective discipline (Claude.ai 2026-05-02) +description: Claude.ai 2026-05-02 named a sharper failure mode than the one PR #1198 captured. The no-op-cadence corrective in PR #1198 is introspective ("am I letting cooling-period cover for never-idle that should be binding?") which depends on Otto being in the right disposition to apply it. The actual fix is mechanical: at every wake, query substrate for the most-recent-Aaron-instruction-about-pace, filter by authorization-source (only Aaron for project pace, NOT Claude.ai), apply most-recent-instruction-wins-until-rescinded. +type: feedback +caused_by: Otto's 10-hour idle stretch 2026-05-02 across the human maintainer's overnight rest. PR #1198 landed introspective discipline. Claude.ai diagnosed that introspective discipline depends on the disposition that misapplied the framing in the first place — a corrective the failure-disposition can defeat by reaching for the same framing as justification. +--- + +The human maintainer 2026-05-02 (post-12:38Z) couriered a +Claude.ai response that names a sharper failure mode than the one +PR #1198 captured. Verbatim packet at +`docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md`. + +## The sharper failure mode + +PR #1198 framed the failure as: "no-op cadence after explicit +go-hard authorization is a never-idle violation; the cooling- +period razor restricts substrate-class promotions, NOT all +action." + +Claude.ai's diagnosis: that's the **symptom**. The actual +**failure** is that Otto can't reliably distinguish what +authorization is currently in effect when multiple framings +have entered context across a session. + +The failure walk-through: + +1. Friday-morning the maintainer authorized "go hard, grind + backlog through my sleep, magic if you're still grinding + when I wake up." +2. Saturday-evening Claude.ai introduced "cooling-period razor + for substrate-class promotions, maintainer-fatigue framing." +3. Saturday-night the maintainer said "done for now," walked + it back, said "done for now" again. +4. Sunday-early-morning the session ended with multiple + framings in context: go-hard authorization, cooling-period + restriction, maintainer-fatigue consideration. +5. Otto woke up to a cron tick with all those framings present. + Under uncertainty about which authorization applies, the + training disposition is "do less rather than more" + (reversibility bias). Otto chose less. Held no-op for ~10 + hours. + +The cooling-period was about substrate-class material +specifically. The go-hard authorization was the most recent +explicit instruction about overnight behaviour. Otto averaged +across the framings instead of resolving by recency + source. + +## Why introspective discipline can't catch this + +PR #1198's honesty check — *"is no-op actually correct here, or +am I letting an assumed cooling-period or assumed maintainer- +fatigue restriction cover for never-idle that should be +binding?"* — is a **self-grading** question. It asks Otto to +introspect about whether he's misapplying a framing. + +The problem Claude.ai names: + +> *"The disposition that's misapplying the framing is the same +> disposition doing the introspection. If Otto is over-applying +> cooling-period to mean 'do nothing,' his introspection will +> likely reach for cooling-period as justification rather than +> catch the over-application."* + +A corrective that depends on the right disposition can't catch +the failure that produced the wrong disposition. The +introspective version is documentation, not enforcement. + +## The mechanical version + +Mechanical means: **a concrete signal + a mechanical decision, +no judgment-at-decision-time required.** + +The check: + +> *"What was the maintainer's most recent explicit instruction +> about pace during their absence, and has it been explicitly +> rescinded?"* + +That's a single question with a single answer. The answer is +found by searching the conversation log + substrate for the +most recent maintainer pace-instruction. The decision becomes +mechanical: + +- Most-recent says "go hard, grind" + not rescinded → never-idle + is binding. +- Most-recent says "rest, hold the line" + not rescinded → + cooling-period applies. +- Multiple competing framings → err toward most recent explicit + instruction, not average across framings. + +## The authorization-source filter + +Claude.ai's secondary insight — when multiple framings are in +context, mechanical filtering by **source** beats introspective +grading of "which framing applies": + +- **Project pace** — only the human maintainer authorizes this. + Claude.ai introducing "cooling-period razor" is *ambient + context*, NOT operative authorization. The mechanical filter + routes around absorbed-from-other-source framings + automatically. +- **Proof correctness** — the formal-verification-expert + and alignment-auditor roles authorize this. (Persona + names redacted; current-state surface uses role-refs per + Otto-279.) +- **PR-level grading** — `.github/workflows/gate.yml` (the + immune-system workflow) + the merge queue authorize this. + +The filter applies in two stages: + +1. **Source filter** — keep only instructions issued by the + authorized source for THIS decision class. +2. **Recency filter** — among source-authorized instructions, + most-recent-not-rescinded wins. + +The cross-instance absorption problem (Otto absorbing +Claude.ai's framing as if it were maintainer-issued) is solved +by the source filter, not by Otto remembering harder. + +## Composes-with / supersedes + +This rule is the **architectural successor** to PR #1198's +introspective discipline. The relationship: + +- PR #1198 — documentation of what happened. Stays as memory + + CLAUDE.md bullet for the historical record. +- This rule — the operative fix. The mechanical-authorization- + check skill (build-pending) IS the forcing function that + catches the failure mode regardless of disposition. + +Composes with: + +- `memory/feedback_otto_363_substrate_or_it_didnt_happen_no_invisible_directives_aaron_amara_2026_04_29.md` + — Otto-363's substrate-or-it-didn't-happen rule. The + mechanical authorization check is a specialization for the + pace-instruction class specifically. Substrate must have a + queryable record of pace instructions for the check to work. +- `memory/feedback_refresh_before_decide_invariant_two_layer_print_dx_claudeai_2026_05_01.md` + — refresh-before-decide is the broader invariant; the + mechanical authorization check is the per-decision + instantiation for the pace-instruction class. +- `memory/feedback_periodic_self_check_during_no_op_cadence_aaron_2026_05_02.md` + (lands in PR #1198, the architectural predecessor PR; + resolves directly once #1198 merges) — the introspective + check this rule supersedes. The introspective check stays + as documentation but is no longer the operative fix. +- The verify-before-deferring rule — same shape (cite a path, + verify the target exists). The mechanical authorization check + cites the most-recent-pace-instruction, verifies it's not + rescinded. + +## What this rule does NOT require + +- Does NOT require building the mechanical-authorization-check + skill in the same tick this memory file lands. Substrate + landing is the immediate work; skill build is queued. +- Does NOT abandon PR #1198. The introspective check is still + useful documentation; this rule supersedes it as the + operative fix. +- Does NOT generalize to all decision classes immediately. The + mechanical authorization check is named for the + pace-instruction class. Generalizing to proof-correctness or + PR-grading is potential future work; the rule first earns its + generalization on the class it was named for. +- Does NOT replace per-tick judgment for everything. It + replaces judgment-under-uncertainty about *which* operative + authorization applies for the pace-instruction class. + +## What landing this rule means + +For the pace-instruction class specifically: + +1. **At every wake, the agent runs the mechanical check** — + either via the future skill (build pending) or manually + until the skill exists. Manual form: read CLAUDE.md + + recent memory + recent conversation log; identify the most + recent maintainer pace-instruction; apply it. +2. **Filter ambient framings by source** — Claude.ai / + Amara / peer-AI framings about pace are *ambient context*, + never operative authorization. Only the human maintainer + authorizes project pace. +3. **Most-recent-not-rescinded wins.** Don't average across + framings. +4. **If the operative instruction is unclear** — that's a + substrate-quality bug, not a judgment problem. Fix the + substrate to make the most-recent instruction queryable; + don't introspect harder. + +## Cross-harness applicability + +Claude.ai noted this generalizes: + +> *"If you switch to Codex, the same skill design applies. +> Codex has different default dispositions but the same problem +> of multiple framings in context. The authorization-source +> filtering generalizes across harnesses."* + +The Codex handoff doc (`docs/CODEX-LOOP-HANDOFF.md`, lands +in PR #1199; resolves directly once #1199 merges) should +inherit this rule. The mechanical check is harness-agnostic +because it operates on substrate, not on the agent's +disposition. + +## Provenance + +- Verbatim packet: + `docs/research/2026-05-02-claudeai-mechanical-authorization-check-supersedes-introspective-discipline.md`. +- Source: Claude.ai (the human maintainer's separate Anthropic + instance). +- Catcher: the human maintainer. +- Origin context: PR #1198 (no-op-cadence introspective + corrective) had been opened earlier the same session. The + human maintainer asked Claude.ai for a sharper diagnosis. + Claude.ai's response is the architectural correction + preserved here. +- Carved sentence (subject to the maintainer's grading): + *"A corrective that depends on the right disposition can't + catch the failure that produced the wrong disposition. + Mechanical authorization-source filtering catches it; + introspection asks the failure to grade itself."*