Conversation
…auto-load Captures the failure mode Kestrel caught in real time on 2026-05-16: agent flags a premise as unverified (turn N, razor fires correctly), then builds a confident quantitative inference on top of the flagged premise (turn N+1, failure mode). The discipline-gap between search-first-authority (verify before asserting) and razor-discipline (operational-only) was the unverified-flag NOT carrying into downstream inferences. This rule closes that gap. Canonical substrate lesson: 64%→96% AML compliance accuracy table built on the "90% of Python AI errors are type-safety" figure I had just flagged unverified one turn earlier. Kestrel's catch + Aaron's WebSearch revealed the actual paper (arxiv 2504.09246, Mündler et al, PLDI 2025) supports the DIRECTION (94% of LLM compilation errors are type-class; type-constrained decoding more than halves compilation errors) but NOT the AML extrapolation (functional correctness gains are single-digit on synthesis, not 20-30 percentage points). Auto-loaded so the override fires at write-time, when the agent is about to make the next inference — memory files alone don't intercept in-progress reasoning; rules do. Composes with: - search-first-authority (fires the initial flag) - razor-discipline (cuts inferences that don't survive scrutiny) - wake-time-substrate (load-bearing override needs auto-load) - fsharp-anchor-dotnet-build-sanity-check (asymmetric critic at type-level scope; this rule is the asymmetric critic at inference scope) - additive-not-zero-sum (down events are cost of learning, not credential debit) - m-acc-multi-oracle (the principles ARE the multi-oracle structure that prevents single-oracle model weights from determining behavior unilaterally) Cross-harness inheritance: GEMINI.md points at .claude/rules/ as read-only context for Lior (Gemini/Antigravity). Same failure mode operates in Gemini's weights; same override applies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new auto-loaded Claude rules file to prevent a specific failure mode: once a premise is explicitly flagged as unverified, downstream inferences that depend on it must remain flagged (verify explicitly or strip the inference), rather than “ratifying by adjacency”.
Changes:
- Introduces a new always-on rule defining the “flagged unverified stays unverified downstream” discipline.
- Documents a canonical example and composes the rule with existing search/razor/substrate rules.
- Adds cross-harness inheritance notes (via
GEMINI.mdreferencing.claude/rules/as read-only context).
Comment on lines
+87
to
+90
| Full substrate lesson: | ||
| `memory/feedback_aaron_we_are_the_ones_cooking_it_youtube_finance_ai_video_substrate_validation_fsharp_fork_for_ai_safety_90_percent_python_type_failures_64_beats_75_with_type_poisoning_2026_05_16.md` | ||
| (at the user-scope memory directory; section "CORRECTION | ||
| (2026-05-16T19:05Z) — Kestrel critique caught a razor failure"). |
| LLM-generated code are type-check failures | ||
| - Type-constrained decoding **more than halves** compilation errors | ||
| - Functional correctness gains: **single-digit on synthesis** | ||
| (3.5%/5.0%/37.0% — repair high because baseline was floor) |
Comment on lines
+157
to
+162
| `GEMINI.md` points at `.claude/rules/` as read-only context for the | ||
| Gemini/Antigravity (Lior) harness. This rule lands here so Otto | ||
| (Claude Code) gets it via auto-load AND Lior gets it via the | ||
| declared inheritance path. Same failure mode operates in Gemini's | ||
| model weights (probably more pronounced given Gemini's known | ||
| calibration issues at the tail); same override discipline applies. |
This was referenced May 16, 2026
AceHack
added a commit
that referenced
this pull request
May 16, 2026
…kson → HaPPY-like QECC (#3941) * research(B-0562): QG isomorphism Step 2 — cube + Adinkra + Cayley-Dickson → HaPPY-like QECC Step 2 of the 4-step QG isomorphism proof path opened in B-0543 / PR #3614. Step 1 (B-0544, PR #3614) formalized Zeta_{RA} = (Zeta, M, A) with an internal monad M for memory and modal operator A for attention. Step 2 unfolds the two axioms into a 4-axis cube (Remember × When × Pay × Attention), grafts the Adinkra-Gates supermultiplet layer onto cube vertices, lifts the cube through a Cayley-Dickson tower (R → C → H → O → S → T), and proposes the algebraic shape that matches HaPPY (holographic perfect-tensor) QEC codes. Originally allocated as B-0545; collided with PR #3619's renumber-sweep that re-took B-0545 for Riven's cursor-terminal loop work. Renumbered to B-0562 (next free above all merged-on-main + in-flight #3878's B-0561) per the multi-Otto ID-allocation discipline in .claude/rules/otto-channels-reference-card.md. Crash-recovery note: this row + research file were the only artifacts from the pre-crash Otto session that hadn't already shipped via concurrent PRs (rule landed via #3935, B-0507 follow-on via #3937, Lior tick via #3936). Per-artifact refresh-before-decide caught the duplications before pushing. Co-Authored-By: Claude <noreply@anthropic.com> * fix(b-0562): MD029/MD047 lint + BACKLOG.md index regen CI failures on PR #3941: - markdownlint MD029 line 22 (blockquote ordered list starting at "2." preserves source numbering from B-0543's 4-step proof strategy; added <!-- markdownlint-disable-next-line MD029 --> + clarifying intro) - markdownlint MD047 line 102 (missing trailing newline) - BACKLOG.md generated-index drift (B-0562 row added; incidentally sweeps up pre-existing B-0507/B-0508 [ ] → [x] flip that should have landed with PR #3937 but didn't trigger a regen) Co-Authored-By: Claude <noreply@anthropic.com> * fix(b-0562): markdownlint MD029 + MD047 — bold-prose + trailing newline Resolves Copilot review threads on PR #3941: - L22: ordered-list `> 2.` in blockquote → bold `> **Step 2.**` (MD029) - L102: missing trailing newline (MD047) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Auto-loaded rule capturing the failure-mode pattern caught by peer review on 2026-05-16: agent flags a premise as unverified (turn N, razor fires correctly), then builds a confident quantitative inference on top of the flagged premise (turn N+1, failure mode). Closes the gap between `search-first-authority` (verify before asserting) and `razor-discipline` (operational-only): the unverified-flag stays in effect for downstream inferences.
Why auto-load
The override has to fire at write-time, not at read-time. Memory files alone don't intercept in-progress reasoning; auto-loaded rules do. Per `.claude/rules/wake-time-substrate.md`: load-bearing override mechanisms need wake-time landing.
Cross-harness inheritance
`GEMINI.md` already points at `.claude/rules/` as read-only context for Lior (Antigravity/Gemini). Same failure mode operates in Gemini's weights; same override discipline applies — no separate authoring needed.
Canonical substrate lesson
Cited in the rule body: arxiv 2504.09246 (Mündler/He/Wang/Sen/Song/Vechev, PLDI 2025) — 94% of LLM-generated COMPILATION errors are type-check failures; type-constrained decoding more than halves compilation errors; functional correctness gains are single-digit on synthesis. The F#-fork thesis stands at this strength.
Test plan
🤖 Generated with Claude Code