Conversation
…-ferry §B + §F + corrections #2 #7 #9 Research-grade design doc for the Stage-2 rung of Amara's corrected promotion ladder. Specifies: (a) placement under src/Experimental/CartelLab/ (not src/Core/ — that's Stage 4); (b) MetricVector type with PLV magnitude AND offset split (correction #6); (c) INullModelGenerator interface + Preserves/Avoids table columns; (d) IAttackInjector forward-looking interface (Stage 3); (e) Wilson-interval reporting contract with {successes, trials, lowerBound, upperBound} schema (correction #2 — no more "~95% CI ±5%" handwave); (f) RobustZScoreMode with Hybrid fallback (correction #7 — percentile-rank when MAD < epsilon); (g) explicit artifact-output layout under artifacts/ coordination-risk/ with five files + run-manifest.json (correction #9). 6-stage promotion path (0 doc / 1 ADR / 2.a skeleton / 2.b full null-models + first attack / 3 attack suite / 4 Core/NetworkIntegrity / 5 Aurora-KSK) matches Amara's corrected ladder and Otto-105 cadence. Doc-only change; no code, no tests, no workflow, no BACKLOG tail touch (avoids positional-conflict pattern that cost #334 → #341 re-file this session). This is the 7th of 10 18th-ferry operationalizations: - #1/#10 test-classification (#339) - #2 Wilson-interval design specified (this doc) - #6 PLV phase-offset shipped (#340) - #7 MAD=0 Hybrid mode specified (this doc) - #9 artifact layout specified (this doc) - #4 exclusivity already shipped (#331) - #5 modularity relational already shipped (#324) Remaining: Wilson-interval IMPLEMENTATION (waits on #323 + Stage 2.a), MAD=0 Hybrid IMPLEMENTATION (waits on #333 + Stage 2.a), conductance-sign doc (waits on #331), Stage-2.a skeleton itself. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a research-grade design document specifying the planned Stage-2 “calibration harness” for coordination-risk / cartel detection work, with pre-committed conventions for metrics, confidence-interval reporting, robust z-score fallback, and artifact outputs.
Changes:
- Introduces a Stage-2 harness design covering module placement (
src/Experimental/CartelLab/), core types/interfaces, and invocation contract. - Specifies statistical reporting discipline (Wilson intervals) and robust z-score modes (including MAD=0 fallback via percentile rank / hybrid).
- Defines a fixed artifact output schema under
artifacts/coordination-risk/for downstream calibration/ROC/PR tooling.
Comment on lines
+444
to
+447
| - **`docs/definitions/KSK.md`** (PR #336) — KSK's Oracle | ||
| layer consumes the harness's per-run Wilson-bounded | ||
| detection rate. Oracle trust posture depends on the | ||
| interval width, not just the point estimate. |
Comment on lines
+491
to
+493
| - Amara 18th ferry — `docs/aurora/2026-04-24-amara- | ||
| calibration-ci-hardening-deep-research-plus-5-5- | ||
| corrections-18th-ferry.md`. |
Comment on lines
+3
to
+9
| **Status:** research-grade proposal (pre-v1). Origin: Amara | ||
| 18th courier ferry, Part 1 §B ("Statistical Calibration Plan"), | ||
| §F PR #2 ("CoordinationRisk calibration harness"), and Part 2 | ||
| corrections #2 (Wilson intervals), #7 (MAD=0 fallback), and #9 | ||
| (explicit artifact output). This doc specifies the Stage-2 | ||
| rung of the corrected promotion ladder. Author: architect | ||
| review. Scope: design-only; no code, no tests, no workflow |
| WilsonInterval.fs ← Wilson score CL helper | ||
| tests/Tests.FSharp/CartelLab/ | ||
| CalibrationHarness.Tests.fs ← seeded smoke tests | ||
| artifacts/coordination-risk/ ← .gitignored; output of runs |
| val run : config: HarnessConfig -> Async<unit> | ||
| ``` | ||
|
|
||
| The runner emits all five artifact files on completion. |
3 tasks
AceHack
added a commit
that referenced
this pull request
Apr 24, 2026
…rections (#344) Dedicated absorb of Amara's 19th courier ferry per CC-002 close-on-existing discipline. Scheduled Otto-164 → executed Otto-165, following 7-ferry precedent (PRs #196 / #211 / #219 / #221 / #235 / #245 / #259 / #330 / #337). Two-part ferry: Part 1 deep-research DST audit (12 sections: rulebook, 12-row entropy scan, dependency audit, 7-row simulation-surface coverage, retry audit, CI determinism, seed discipline, Cartel-Lab DST readiness, KSK/Aurora DST readiness, state-of-the-art comparison, 10-row PR roadmap, what-not-to-claim caveats; Mermaid CI diagram + Gantt timeline). Part 2 Amara's own 5.5-Thinking correction pass (7 required corrections, per-area grade table with B- overall, revised 6-PR roadmap with titles locked, DST-held + FoundationDB-grade acceptance criteria, copy-paste Kenji summary). Key findings: - DST grade: B- (strong architecture, partial impl) - Blockers: DiskBackingStore bypasses simulation (D-grade filesystem simulation), no ISimulationDriver, Task.Run ambient ThreadPool risk, no seed artifacts / no swarm harness - 4 of 12 Part-1 sections already align with shipped substrate: - §6 test classification → PR #339 - §7 artifact layout → PR #342 design - §8 Cartel-Lab stage discipline → PRs #330/#337/#342 - §9 KSK advisory-only → PR #336 + Otto-140..145 memory 6-PR revised roadmap queued as graduation candidates: 1. DST scanner + accepted-boundary registry (new tool + policy docs + workflow) 2. Seed protocol + CI artifacts 3. Sharder reproduction (NOT widen) — reinforces 18th #10 4. ISimulationDriver + VTS promotion to core 5. Simulated filesystem (DiskBackingStore rewrite) 6. Cartel-Lab DST calibration (aligns with #342 design) Plus: push-with-retry.sh retry-audit finding; DST-held + FDB-grade criteria lock. GOVERNANCE §33 four-field header (Scope / Attribution / Operational status / Non-fusion disclaimer). Amara verdict preserved: "strong draft / not canonical yet." Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 24, 2026
…mara 19th-ferry correction #6) (#346) Research-grade criteria doc locking two acceptance bars: 1. DST-held — minimum: 6 items (seeds committed, failing tests emit seed+params, bit-for-bit local-vs-CI reproducibility, broad sweeps nightly-not-gating, zero unreviewed entropy hits in main-path, boundaries either simulated or explicitly accepted). 2. FoundationDB-grade DST candidate — aspirational: 8 surfaces (simulated FS, simulated network, deterministic task scheduler, fault injection/buggify, swarm runner, replay artifact storage, failure minimization/shrinking, end-to-end scenario from one seed). Maps 19th-ferry revised-roadmap PRs to which criteria items each addresses. Captures Amara's per-area grade table (overall B-) as "Amara's assessment, not factory- certified." Explicit promotion path: doc stays research-grade until PR 1 of the 19th-ferry revised roadmap lands an ADR promoting the DST-held bar to factory discipline; at that point criteria migrate to docs/DST-COMPLIANCE.md top-level. No graduation claims DST-held today; graduations reference this doc as target without self-certification. Composes with test-classification.md (PR #339; supports items 1+2+4), calibration-harness-stage2-design.md (PR #342; artifact schema supports item 2), Amara 19th ferry (PR #344 absorb; source of criteria). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Research-grade design doc for Stage-2 of Amara's corrected promotion ladder. Specifies the next-rung deliverable (calibration harness) so when implementation starts, conventions are pre-committed.
Key design decisions
src/Experimental/CartelLab/(notsrc/Core/— that's Stage 4 per Amara ladder).{successes, trials, lowerBound, upperBound}— no more "~95% CI ±5%" handwave (correction Round 26 — rename tail, §18 memory clarification, three dispatches #2).artifacts/coordination-risk/(correction Round 33 Track D — static analysis: shellcheck + actionlint + markdownlint + editor config + cspell + vscode #9).INullModelGenerator+IAttackInjectorinterfaces withPreserves/Avoids/ExpectedSignalsmachine-readable annotations.Scope
18th-ferry operationalization status
7 of 10 18th-ferry corrections now have either shipped code or committed design.
Test plan
🤖 Generated with Claude Code