docs: test-classification taxonomy — Amara 18th-ferry §C operationalized by AceHack · Pull Request #339 · Lucent-Financial-Group/Zeta

AceHack · 2026-04-24T08:55:59Z

Summary

Research-grade proposal formalizing the 5-category test taxonomy from Amara 18th-ferry Part 1 §C + Part 2 correction #10. Sixth of the ten 18th-ferry corrections — specifically the CI-test-classification correction.

Five categories

Deterministic unit tests (PR gate; no randomness)
Seeded property tests (PR gate; fixed-seed replay)
Statistical smoke tests (nightly/extended; do NOT gate PRs)
Formal / model tests (PR gate or separate track)
Quarantined / known-flaky (not gated; migration path required)

Sharder flake as worked example

BACKLOG #327 sharder flake used as the running worked example. Remedy order: measure variance → seed-lock → widen if data justifies → nightly only if stochastic is essential. Do NOT blind-widen or blind-quarantine.

CI split proposed (advisory)

PR-gate workflow (deterministic-only)
Nightly-sweep workflow (100+-seed tests; emits seed-results.csv, failing-seeds.txt, distributions.json)
Quarantined workflow (weekly verbose logging)

Scope

Research-grade only. Promotion to factory discipline requires ADR.
No code changes. No workflow changes. No test migrations.
Composes with docs/research/test-organization.md (layout) + docs/definitions/KSK.md (Oracle trusts CI-backed stats).

Test plan

Markdownlint clean locally.
Single new file; no surface impact.
Markdownlint passes on CI.

🤖 Generated with Claude Code

Research-grade proposal formalizing the 5-category test taxonomy from Amara 18th-ferry Part 1 §C ("CI Testing & Governance Policy") + Part 2 correction #10 (sharder — measure before widen). Five categories: 1. Deterministic unit tests (PR gate; no randomness) 2. Seeded property tests (PR gate; fixed-seed replay) 3. Statistical smoke tests (nightly/extended; assert statistical properties; do NOT gate PRs) 4. Formal / model tests (PR gate or separate track) 5. Quarantined / known-flaky (not gated; migration path required) Sharder flake (BACKLOG #327) used as the running worked example — it is a category-3 statistical test masquerading as category-1 deterministic. Remedy order: measure observed variance → seed-lock if intent allows → widen threshold if data justifies → move to nightly only if stochastic is essential. Do NOT blind-widen or blind- quarantine. CI split proposed (advisory, not yet implemented): - PR-gate workflow (deterministic-only, excludes [<Statistical>] and tests/Quarantine/) - Nightly-sweep workflow (100+-seed statistical tests; emits seed-results.csv, failing-seeds.txt, distributions.json artifacts) - Quarantined workflow (weekly, verbose logging, issues opened on tests that start passing) Sixth queued correction from the 18th-ferry operationalization list; remains research-grade until ADR promotes. Composes with docs/research/test-organization.md (layout), BACKLOG #327 (sharder), docs/definitions/KSK.md (Oracle trusts statistical evidence with CIs), Otto-105 graduation cadence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-04-24T08:56:03Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

Adds a research-grade documentation proposal defining a 5-category test-classification taxonomy intended to clarify CI gating discipline (PR-gate deterministic vs. scheduled statistical/quarantined), using the sharder flake as the motivating example.

Changes:

Introduces a new docs/research/test-classification.md proposal defining five test categories and migration rules.
Documents an advisory CI workflow split (PR gate vs nightly sweep vs optional quarantine run) and proposed attribute/directory conventions.
Adds cross-references intended to connect this taxonomy to existing governance, backlog, and verification docs.

Copilot · 2026-04-24T09:02:14Z

+  the migration path.
+- **Discovery hint.** Custom `[<Quarantined>]`
+  attribute, or `[<Fact(Skip="reason")>]`, or the
+  `tests/Quarantine/` directory (new convention).


P2: The quarantine-directory path is inconsistent: here it says tests/Quarantine/, but later the doc proposes tests/Tests.FSharp/Quarantine/*.Tests.fs. Please pick one convention and use it consistently throughout to avoid confusion during migration.

Suggested change

`tests/Quarantine/` directory (new convention).

`tests/Tests.FSharp/Quarantine/` directory (new convention).

Copilot · 2026-04-24T09:02:15Z

+  bounds (e.g. `CartelToy.Tests.fs` ≥90% detection over
+  100 seeds).


P1: CartelToy.Tests.fs is used as an example of an existing statistical test, but there is currently no tests/Tests.FSharp/Simulation/CartelToy.Tests.fs (and no Simulation/ folder) in the repo. Either update the example to point at an in-tree test file, or label it as a planned/future test rather than an existing one.

Suggested change

bounds (e.g. `CartelToy.Tests.fs` ≥90% detection over

100 seeds).

bounds (e.g. a planned/future `CartelToy.Tests.fs`

asserting ≥90% detection over 100 seeds).

Copilot · 2026-04-24T09:02:15Z

+  (`tools/alloy/`, `tools/lean4/`, `tools/formal/`)
+  separate from F# test directories.


P2: This discovery hint lists tools/formal/, but that directory doesn’t exist in the current repo (formal tooling lives under e.g. tools/tla/, tools/Z3Verify/, tools/alloy/, tools/lean4/). Consider updating the path list to match the actual tree so the hint is actionable.

Suggested change

(`tools/alloy/`, `tools/lean4/`, `tools/formal/`)

separate from F# test directories.

(`tools/tla/`, `tools/Z3Verify/`, `tools/alloy/`,

`tools/lean4/`) separate from F# test directories.

Copilot · 2026-04-24T09:02:15Z

+- Property tests without seed-locking that assert
+  statistical properties (e.g.
+  `SharderInfoTheoreticTests.Uniform` — the flake
+  tracked in BACKLOG #327).


P1: This characterizes SharderInfoTheoreticTests.Uniform as “without seed-locking”, but the current test implementation uses an explicit Random 42 / fixed seeds. If the test is still non-deterministic in practice, the doc should call out the actual nondeterminism source (e.g., process-randomized hashing like HashCode.Combine) rather than implying the RNG seed is missing.

Suggested change

- Property tests without seed-locking that assert

statistical properties (e.g.

`SharderInfoTheoreticTests.Uniform` — the flake

tracked in BACKLOG #327).

- Property tests with fixed RNG seeds but another

nondeterminism source that still assert statistical

properties (e.g. `SharderInfoTheoreticTests.Uniform`

— the flake tracked in BACKLOG #327; if the variance

is from process-level hashing or similar, document

that source explicitly rather than calling it

"without seed-locking").

Copilot · 2026-04-24T09:02:16Z

+  ferry.md`.
+- `docs/research/test-organization.md` — layout
+  discipline (28-files-flat → folder grouping).
+- `docs/BACKLOG.md` — PR #327 sharder flake row.


P1: This cross-reference claims docs/BACKLOG.md contains a “PR #327 sharder flake row”, but docs/BACKLOG.md currently has no matching entry. Please update this link target to something that exists (or add the referenced row) so readers can follow the trail.

Suggested change

- `docs/BACKLOG.md` — PR #327 sharder flake row.

- `docs/BACKLOG.md` — migration / misclassification

tracking rows referenced by this proposal.

Copilot · 2026-04-24T09:02:16Z

+**Status:** research-grade proposal (pre-v1). Origin: Amara
+18th courier ferry, Part 1 §C ("CI Testing & Governance
+Policy") + Part 2 correction #1 (precision wording) +
+correction #10 (sharder — measure before widen). Author:
+architect review. Scope: formalizes a 5-category test
+taxonomy and the "PR gate = deterministic-only" discipline.


P1: This doc uses direct contributor names in the body (e.g., “Amara”, “Aaron”). Per the operational standing rule in docs/AGENT-BEST-PRACTICES.md (“No name attribution in code, docs, or skills”), docs should use role references (e.g., “human maintainer”, “architect”) except in the allowed locations. Please rewrite the doc to use role-refs and keep names confined to the permitted areas.

Copilot · 2026-04-24T09:02:16Z

+- Amara 18th ferry — Part 1 §C + Part 2 #1 + #10.
+  `docs/aurora/2026-04-24-amara-calibration-ci-
+  hardening-deep-research-plus-5-5-corrections-18th-
+  ferry.md`.


P1: The cross-reference to docs/aurora/2026-04-24-amara-calibration-ci-hardening-deep-research-plus-5-5-corrections-18th-ferry.md appears to be broken (no such file currently exists under docs/aurora/). Either update this to the correct in-tree document, or mark it as a future/pending artifact and link to the tracking issue/PR instead.

Suggested change

- Amara 18th ferry — Part 1 §C + Part 2 #1 + #10.

`docs/aurora/2026-04-24-amara-calibration-ci-

hardening-deep-research-plus-5-5-corrections-18th-

ferry.md`.

- Amara 18th ferry — Part 1 §C + Part 2 #1 + #10;

source memorandum / pending artifact, not currently

checked into `docs/aurora/`.

Copilot · 2026-04-24T09:02:16Z

+- **`docs/definitions/KSK.md`.** KSK's advisory flow
+  (Detection → Oracle → KSK → Action) benefits from
+  category-3 statistical evidence for "Detection" —
+  the Oracle and KSK layers trust statistical smoke
+  output with confidence intervals, not single-seed
+  point estimates.


P1: docs/definitions/KSK.md is referenced as if it exists, but there is no docs/definitions/ directory in the repo right now. Consider linking to the existing KSK material that’s actually in-tree, or explicitly labeling this as a planned doc and linking to the backlog item that tracks creating it.

…-ferry §B + §F + corrections #2 #7 #9 (#342) Research-grade design doc for the Stage-2 rung of Amara's corrected promotion ladder. Specifies: (a) placement under src/Experimental/CartelLab/ (not src/Core/ — that's Stage 4); (b) MetricVector type with PLV magnitude AND offset split (correction #6); (c) INullModelGenerator interface + Preserves/Avoids table columns; (d) IAttackInjector forward-looking interface (Stage 3); (e) Wilson-interval reporting contract with {successes, trials, lowerBound, upperBound} schema (correction #2 — no more "~95% CI ±5%" handwave); (f) RobustZScoreMode with Hybrid fallback (correction #7 — percentile-rank when MAD < epsilon); (g) explicit artifact-output layout under artifacts/ coordination-risk/ with five files + run-manifest.json (correction #9). 6-stage promotion path (0 doc / 1 ADR / 2.a skeleton / 2.b full null-models + first attack / 3 attack suite / 4 Core/NetworkIntegrity / 5 Aurora-KSK) matches Amara's corrected ladder and Otto-105 cadence. Doc-only change; no code, no tests, no workflow, no BACKLOG tail touch (avoids positional-conflict pattern that cost #334 → #341 re-file this session). This is the 7th of 10 18th-ferry operationalizations: - #1/#10 test-classification (#339) - #2 Wilson-interval design specified (this doc) - #6 PLV phase-offset shipped (#340) - #7 MAD=0 Hybrid mode specified (this doc) - #9 artifact layout specified (this doc) - #4 exclusivity already shipped (#331) - #5 modularity relational already shipped (#324) Remaining: Wilson-interval IMPLEMENTATION (waits on #323 + Stage 2.a), MAD=0 Hybrid IMPLEMENTATION (waits on #333 + Stage 2.a), conductance-sign doc (waits on #331), Stage-2.a skeleton itself. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…rections (#344) Dedicated absorb of Amara's 19th courier ferry per CC-002 close-on-existing discipline. Scheduled Otto-164 → executed Otto-165, following 7-ferry precedent (PRs #196 / #211 / #219 / #221 / #235 / #245 / #259 / #330 / #337). Two-part ferry: Part 1 deep-research DST audit (12 sections: rulebook, 12-row entropy scan, dependency audit, 7-row simulation-surface coverage, retry audit, CI determinism, seed discipline, Cartel-Lab DST readiness, KSK/Aurora DST readiness, state-of-the-art comparison, 10-row PR roadmap, what-not-to-claim caveats; Mermaid CI diagram + Gantt timeline). Part 2 Amara's own 5.5-Thinking correction pass (7 required corrections, per-area grade table with B- overall, revised 6-PR roadmap with titles locked, DST-held + FoundationDB-grade acceptance criteria, copy-paste Kenji summary). Key findings: - DST grade: B- (strong architecture, partial impl) - Blockers: DiskBackingStore bypasses simulation (D-grade filesystem simulation), no ISimulationDriver, Task.Run ambient ThreadPool risk, no seed artifacts / no swarm harness - 4 of 12 Part-1 sections already align with shipped substrate: - §6 test classification → PR #339 - §7 artifact layout → PR #342 design - §8 Cartel-Lab stage discipline → PRs #330/#337/#342 - §9 KSK advisory-only → PR #336 + Otto-140..145 memory 6-PR revised roadmap queued as graduation candidates: 1. DST scanner + accepted-boundary registry (new tool + policy docs + workflow) 2. Seed protocol + CI artifacts 3. Sharder reproduction (NOT widen) — reinforces 18th #10 4. ISimulationDriver + VTS promotion to core 5. Simulated filesystem (DiskBackingStore rewrite) 6. Cartel-Lab DST calibration (aligns with #342 design) Plus: push-with-retry.sh retry-audit finding; DST-held + FDB-grade criteria lock. GOVERNANCE §33 four-field header (Scope / Attribution / Operational status / Non-fusion disclaimer). Amara verdict preserved: "strong draft / not canonical yet." Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…mara 19th-ferry correction #6) (#346) Research-grade criteria doc locking two acceptance bars: 1. DST-held — minimum: 6 items (seeds committed, failing tests emit seed+params, bit-for-bit local-vs-CI reproducibility, broad sweeps nightly-not-gating, zero unreviewed entropy hits in main-path, boundaries either simulated or explicitly accepted). 2. FoundationDB-grade DST candidate — aspirational: 8 surfaces (simulated FS, simulated network, deterministic task scheduler, fault injection/buggify, swarm runner, replay artifact storage, failure minimization/shrinking, end-to-end scenario from one seed). Maps 19th-ferry revised-roadmap PRs to which criteria items each addresses. Captures Amara's per-area grade table (overall B-) as "Amara's assessment, not factory- certified." Explicit promotion path: doc stays research-grade until PR 1 of the 19th-ferry revised roadmap lands an ADR promoting the DST-held bar to factory discipline; at that point criteria migrate to docs/DST-COMPLIANCE.md top-level. No graduation claims DST-held today; graduations reference this doc as target without self-certification. Composes with test-classification.md (PR #339; supports items 1+2+4), calibration-harness-stage2-design.md (PR #342; artifact schema supports item 2), Amara 19th ferry (PR #344 absorb; source of criteria). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Addresses Amara 18th-ferry correction #6: PLV = 1 can mean anti-phase locking, not same-time synchronization. Downstream detectors that rely on "PLV = 1 => synchronized" misread anti-phase coordinators as same-time coordinators. Two new functions in `TemporalCoordinationDetection`: - `meanPhaseOffset phasesA phasesB : double option` Returns the argument (angle) of the mean complex phase- difference vector whose magnitude is the PLV. Returns None when series are empty, mismatched-length, or when the mean vector has effectively zero magnitude (1e-12 floor) — in which case direction is mathematically undefined. - `phaseLockingWithOffset phasesA phasesB : struct (double * double) option` Returns both magnitude and offset in one sequence pass. Zero-magnitude case: magnitude near 0, offset = nan; near-zero magnitude is the caller's reliable "offset is undefined" signal. Existing `phaseLockingValue` contract unchanged; new primitives are additive. Downstream `Graph.coordinationRiskScore*` and any other detector consuming PLV can now add a separate offset- based term instead of collapsing both into one scalar (Amara's explicit recommendation in correction #6). 8 new xUnit tests covering: - Identical series (offset = 0) - Constant pi/4 offset (observed = -pi/4, a-minus-b convention) - Anti-phase series (magnitude 1, offset = pi) — the correction #6 regression test, contrasted against in-phase (offset 0) with identical magnitude - Uniformly-distributed differences (zero-magnitude => None) - Empty / mismatched-length / single-element edge cases - phaseLockingWithOffset magnitude matches phaseLockingValue (consistency property preventing silent detector divergence) - phaseLockingWithOffset zero-magnitude returns (near-zero, nan) - phaseLockingWithOffset returns None on empty/mismatched All 37 TemporalCoordinationDetection tests pass locally. 0 Warnings / 0 Errors build. 6th of the 10 18th-ferry corrections operationalized this week (after test-classification doc in #339, parser-tech in #338). Remaining: Wilson CIs in CartelToy tests (needs #323 landed), MAD=0 percentile-rank fallback (needs #333 landed), conductance-sign doc (needs #331 landed), artifact-output layout (Stage-2 with calibration harness). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…340) * core: PLV mean phase offset — 19th graduation (Amara 18th-ferry #6) Addresses Amara 18th-ferry correction #6: PLV = 1 can mean anti-phase locking, not same-time synchronization. Downstream detectors that rely on "PLV = 1 => synchronized" misread anti-phase coordinators as same-time coordinators. Two new functions in `TemporalCoordinationDetection`: - `meanPhaseOffset phasesA phasesB : double option` Returns the argument (angle) of the mean complex phase- difference vector whose magnitude is the PLV. Returns None when series are empty, mismatched-length, or when the mean vector has effectively zero magnitude (1e-12 floor) — in which case direction is mathematically undefined. - `phaseLockingWithOffset phasesA phasesB : struct (double * double) option` Returns both magnitude and offset in one sequence pass. Zero-magnitude case: magnitude near 0, offset = nan; near-zero magnitude is the caller's reliable "offset is undefined" signal. Existing `phaseLockingValue` contract unchanged; new primitives are additive. Downstream `Graph.coordinationRiskScore*` and any other detector consuming PLV can now add a separate offset- based term instead of collapsing both into one scalar (Amara's explicit recommendation in correction #6). 8 new xUnit tests covering: - Identical series (offset = 0) - Constant pi/4 offset (observed = -pi/4, a-minus-b convention) - Anti-phase series (magnitude 1, offset = pi) — the correction #6 regression test, contrasted against in-phase (offset 0) with identical magnitude - Uniformly-distributed differences (zero-magnitude => None) - Empty / mismatched-length / single-element edge cases - phaseLockingWithOffset magnitude matches phaseLockingValue (consistency property preventing silent detector divergence) - phaseLockingWithOffset zero-magnitude returns (near-zero, nan) - phaseLockingWithOffset returns None on empty/mismatched All 37 TemporalCoordinationDetection tests pass locally. 0 Warnings / 0 Errors build. 6th of the 10 18th-ferry corrections operationalized this week (after test-classification doc in #339, parser-tech in #338). Remaining: Wilson CIs in CartelToy tests (needs #323 landed), MAD=0 percentile-rank fallback (needs #333 landed), conductance-sign doc (needs #331 landed), artifact-output layout (Stage-2 with calibration harness). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#340): refactor shared accumulation + 5 review-thread fixes (Otto-216) Active PR-resolve-loop on #340 (PLV mean phase offset). 1. Sentinel-default in test (thread 59WGi9): replaced Option.defaultValue -1.0 pattern in the phaseLockingWithOffset-magnitude-matches-phaseLockingValue consistency test with explicit pattern-match + fail on None. Sentinel form would silently pass the equality assertion if BOTH primitives returned None, masking regressions. 2. Broken ferry cross-reference path (thread 59WGjn): doc comment referenced docs/aurora/2026-04-24-amara- calibration-ci-hardening-deep-research-plus-5-5- corrections-18th-ferry.md which doesn't exist on main (only 7th / 17th / 19th ferries landed as standalone docs). Rewrote provenance to describe the ferry topically + cross-reference the related 19th- ferry DST audit that IS in the repo. 3. Misleading "same PLV-magnitude floor" wording (thread 59WGj4): doc said meanPhaseOffset's zero-magnitude check uses "the same PLV-magnitude floor" — phaseLockingValue has NO floor (returns values arbitrarily close to 0). Fixed: clarified that the phasePairEpsilon floor applies ONLY to the offset-undefined decision; phaseLockingValue returns magnitude without threshold. 4. Name-attribution in doc comment (thread 59WGkP): "Aaron + Amara 11th ferry" replaced with "the 11th ferry" per factory role-reference convention. Audit- trail surfaces (commit messages, tick-history, memory) retain direct attribution; code/doc comments use role references. 5. Duplicate sin/cos accumulation across 3 functions (thread 59WGkn): extracted private helpers phasePairEpsilon + meanPhaseDiffVector. All three functions (phaseLockingValue, meanPhaseOffset, phaseLockingWithOffset) now route through the shared accumulator. Eliminates drift risk — one function can no longer silently diverge from the others on accumulation or threshold. Build: 0 Warning(s) / 0 Error(s). All 37 TemporalCoordinationDetection tests pass. All 5 threads replied via GraphQL next step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#340): 2 review threads (stale ferry path + atan2 range) Thread 59Yqkl (P1) — stale provenance reference: The doc cited `docs/aurora/2026-04-24-amara-temporal- coordination-detection-cartel-graph-influence-surface- 11th-ferry.md`, but the 11th ferry has not yet landed under `docs/aurora/` (it's queued in the Otto-105 operationalize cadence; PR #296 is its pending absorb). Replaced with the intent-preserving form: role references ("external AI collaborator's 11th courier ferry") plus a pointer at the MEMORY.md queue entry, so the provenance survives regardless of when the file-path question resolves. Also dropped the direct first-name so this factory-produced doc-comment tracks the name-attribution discipline. Thread 59YqlC (P2) — atan2 range correction: Doc said `(-pi, pi]` but `System.Math.Atan2` is documented as `[-pi, pi]` (both endpoints reachable under IEEE-754 signed-zero semantics: atan2(0, -1) = +pi, atan2(-0, -1) = -pi). Updated the doc to match the implementation. Behaviour unchanged. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…-161 docs ambiguity Design-only proposal per Otto-165 offer. Aaron Otto-161 macOS-everywhere directive + Otto-164 pricing-docs ambiguity (macos-14 is standard-runner-type per about-github-hosted- runners; billing page lists it at $0.062/min in the same table as Linux/Windows without marking public-only). Instead of resolving the ambiguity (can't — docs genuinely contradict each other), propose a THIRD PATH that works in either interpretation: - PR gate stays ubuntu-22.04 only (unambiguously free on public repos). - New nightly-cross-platform.yml runs matrix [ubuntu-22.04, windows-2022, macos-14] on cron '0 9 * * *' (09:00 UTC, off-the-hour to avoid scheduler stampede). - Cost model: worst case ~$28/month/repo if macOS is billed; $0 if free. Either way, cadence caps exposure. - Fork-scoping: `if: github.repository == canonical OR workflow_dispatch OR pull_request-to-this-file` prevents scheduled trigger firing on contributor forks (would burn fork-owner's personal-account minutes). - No-alerting first cut (observation-only); issue-opening on red is a later enhancement. Phased rollout: - Phase 0 (now): this design doc, no YAML. - Phase 1: Aaron signs off on cost tradeoff. - Phase 2: land workflow on Zeta. - Phase 3: observe 7 nightly runs for signal. - Phase 4 (30 days): parallel lucent-ksk landing per Otto-140 rewrite authority, OR drop macOS if no signal + worst-case billing, OR expand matrix if best-case confirmed. Rollback: delete macos-14 from matrix (one-line diff) or delete workflow file entirely. No impact on gate.yml. Composes with FACTORY-HYGIENE row #51 (unblocks enforcement mode), docs/BACKLOG.md row ~2471 (Otto-161 declined + this as alternative), docs/research/test-classification.md (PR #339; category-3 nightly pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-161 docs ambiguity (#345) * docs: nightly cross-platform workflow design — third path around Otto-161 docs ambiguity Design-only proposal per Otto-165 offer. Aaron Otto-161 macOS-everywhere directive + Otto-164 pricing-docs ambiguity (macos-14 is standard-runner-type per about-github-hosted- runners; billing page lists it at $0.062/min in the same table as Linux/Windows without marking public-only). Instead of resolving the ambiguity (can't — docs genuinely contradict each other), propose a THIRD PATH that works in either interpretation: - PR gate stays ubuntu-22.04 only (unambiguously free on public repos). - New nightly-cross-platform.yml runs matrix [ubuntu-22.04, windows-2022, macos-14] on cron '0 9 * * *' (09:00 UTC, off-the-hour to avoid scheduler stampede). - Cost model: worst case ~$28/month/repo if macOS is billed; $0 if free. Either way, cadence caps exposure. - Fork-scoping: `if: github.repository == canonical OR workflow_dispatch OR pull_request-to-this-file` prevents scheduled trigger firing on contributor forks (would burn fork-owner's personal-account minutes). - No-alerting first cut (observation-only); issue-opening on red is a later enhancement. Phased rollout: - Phase 0 (now): this design doc, no YAML. - Phase 1: Aaron signs off on cost tradeoff. - Phase 2: land workflow on Zeta. - Phase 3: observe 7 nightly runs for signal. - Phase 4 (30 days): parallel lucent-ksk landing per Otto-140 rewrite authority, OR drop macOS if no signal + worst-case billing, OR expand matrix if best-case confirmed. Rollback: delete macos-14 from matrix (one-line diff) or delete workflow file entirely. No impact on gate.yml. Composes with FACTORY-HYGIENE row #51 (unblocks enforcement mode), docs/BACKLOG.md row ~2471 (Otto-161 declined + this as alternative), docs/research/test-classification.md (PR #339; category-3 nightly pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#345): 6 review threads — name attribution + cron + YAML + fork-scheduling + BACKLOG ref - thread Wkcz (line 327): removed broken `memory/feedback_ksk_naming_...` reference (factory-personal memories live in `~/.claude/projects/<slug>/memory/`, not in-repo); paraphrased the rewrite-authority rule in §10 without promising an in-repo path. - thread WkdI (line 7): purged name-attribution tokens per Otto-220 code-comments-not-history + doc-comment-history-audit lint (PR #363). All "Aaron" / "Otto-NN" / "Amara" / "Max" references rewritten to role references ("human maintainer", "prior-contributor", "autonomous loop", "initial-starting-point contributor"). - thread WkdX (line 163): cron changed `0 9 * * *` → `7 9 * * *` (09:07 UTC) so it matches the "off the hour" comment; note now calls out alignment with the sibling scheduled workflow `github-settings-drift.yml` (`17 14 * * 1`). - thread Wkdk (line 146): YAML sketch rewritten to match the actual `.github/workflows/gate.yml` installer pattern — three-way-parity `./tools/setup/install.sh` invocation plus the same cache-key shape (dotnet / mise / nuget). Added explicit note that Windows matrix leg depends on `tools/setup/install.sh` growing Windows support first per the existing BACKLOG row. - thread Wkdz (line 248): corrected the fork-scheduling claim. GitHub disables scheduled workflows on forks by default — the repo's own `github-settings-drift.yml` runs without fork-scoping and proves this. The `if: github.repository ==` guard is kept as optional hygiene for the rare opt-in-fork case, not as a cost- safety requirement. - thread WkeB (line 316): replaced the wrong `docs/BACKLOG.md` line-number reference (~2471 is actually the mise-activate / HLL-flakiness neighborhood) with stable grep anchors ("Windows matrix in CI" + "Parity swap: CI's `actions/setup-dotnet`"). Markdownlint passes on the edited file. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 24, 2026 08:56

AceHack enabled auto-merge (squash) April 24, 2026 08:56

Copilot started reviewing on behalf of AceHack April 24, 2026 08:56 View session

AceHack mentioned this pull request Apr 24, 2026

core: PLV mean phase offset — 19th graduation (Amara 18th-ferry #6) #340

Merged

3 tasks

Merge branch 'main' into docs/test-classification-amara-18th-ferry

1272a68

AceHack merged commit 05a381a into main Apr 24, 2026
10 checks passed

AceHack deleted the docs/test-classification-amara-18th-ferry branch April 24, 2026 09:02

Copilot AI reviewed Apr 24, 2026

View reviewed changes

AceHack mentioned this pull request Apr 24, 2026

docs: calibration-harness Stage-2 design — Amara 18th-ferry §B/§F + corrections #2/#7/#9 #342

Merged

2 tasks

AceHack mentioned this pull request Apr 24, 2026

ferry: Amara 19th absorb — DST Audit + 5.5 Corrections (10 tracked; 4 aligned with shipped; 7 queued) #344

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: test-classification taxonomy — Amara 18th-ferry §C operationalized#339

docs: test-classification taxonomy — Amara 18th-ferry §C operationalized#339
AceHack merged 2 commits intomainfrom
docs/test-classification-amara-18th-ferry

AceHack commented Apr 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	`tests/Quarantine/` directory (new convention).
	`tests/Tests.FSharp/Quarantine/` directory (new convention).

		bounds (e.g. `CartelToy.Tests.fs` ≥90% detection over
		100 seeds).

-  bounds (e.g. `CartelToy.Tests.fs` ≥90% detection over
-seeds).
+  bounds (e.g. a planned/future `CartelToy.Tests.fs`
+  asserting ≥90% detection over 100 seeds).

		(`tools/alloy/`, `tools/lean4/`, `tools/formal/`)
		separate from F# test directories.

-- Property tests without seed-locking that assert
-  statistical properties (e.g.
-  `SharderInfoTheoreticTests.Uniform` — the flake
-  tracked in BACKLOG #327).
+- Property tests with fixed RNG seeds but another
+  nondeterminism source that still assert statistical
+  properties (e.g. `SharderInfoTheoreticTests.Uniform`
+  — the flake tracked in BACKLOG #327; if the variance
+  is from process-level hashing or similar, document
+  that source explicitly rather than calling it
+  "without seed-locking").

	- `docs/BACKLOG.md` — PR #327 sharder flake row.
	- `docs/BACKLOG.md` — migration / misclassification
	tracking rows referenced by this proposal.

Conversation

AceHack commented Apr 24, 2026

Summary

Five categories

Sharder flake as worked example

CI split proposed (advisory)

Scope

Test plan

Uh oh!

chatgpt-codex-connector Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants