docs: test-classification taxonomy — Amara 18th-ferry §C operationalized#339
docs: test-classification taxonomy — Amara 18th-ferry §C operationalized#339
Conversation
Research-grade proposal formalizing the 5-category test
taxonomy from Amara 18th-ferry Part 1 §C ("CI Testing &
Governance Policy") + Part 2 correction #10 (sharder —
measure before widen).
Five categories:
1. Deterministic unit tests (PR gate; no randomness)
2. Seeded property tests (PR gate; fixed-seed replay)
3. Statistical smoke tests (nightly/extended; assert
statistical properties; do NOT gate PRs)
4. Formal / model tests (PR gate or separate track)
5. Quarantined / known-flaky (not gated; migration path
required)
Sharder flake (BACKLOG #327) used as the running worked
example — it is a category-3 statistical test masquerading
as category-1 deterministic. Remedy order: measure
observed variance → seed-lock if intent allows → widen
threshold if data justifies → move to nightly only if
stochastic is essential. Do NOT blind-widen or blind-
quarantine.
CI split proposed (advisory, not yet implemented):
- PR-gate workflow (deterministic-only, excludes
[<Statistical>] and tests/Quarantine/)
- Nightly-sweep workflow (100+-seed statistical tests;
emits seed-results.csv, failing-seeds.txt,
distributions.json artifacts)
- Quarantined workflow (weekly, verbose logging, issues
opened on tests that start passing)
Sixth queued correction from the 18th-ferry operationalization
list; remains research-grade until ADR promotes. Composes with
docs/research/test-organization.md (layout), BACKLOG #327
(sharder), docs/definitions/KSK.md (Oracle trusts statistical
evidence with CIs), Otto-105 graduation cadence.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a research-grade documentation proposal defining a 5-category test-classification taxonomy intended to clarify CI gating discipline (PR-gate deterministic vs. scheduled statistical/quarantined), using the sharder flake as the motivating example.
Changes:
- Introduces a new
docs/research/test-classification.mdproposal defining five test categories and migration rules. - Documents an advisory CI workflow split (PR gate vs nightly sweep vs optional quarantine run) and proposed attribute/directory conventions.
- Adds cross-references intended to connect this taxonomy to existing governance, backlog, and verification docs.
| the migration path. | ||
| - **Discovery hint.** Custom `[<Quarantined>]` | ||
| attribute, or `[<Fact(Skip="reason")>]`, or the | ||
| `tests/Quarantine/` directory (new convention). |
There was a problem hiding this comment.
P2: The quarantine-directory path is inconsistent: here it says tests/Quarantine/, but later the doc proposes tests/Tests.FSharp/Quarantine/*.Tests.fs. Please pick one convention and use it consistently throughout to avoid confusion during migration.
| `tests/Quarantine/` directory (new convention). | |
| `tests/Tests.FSharp/Quarantine/` directory (new convention). |
| bounds (e.g. `CartelToy.Tests.fs` ≥90% detection over | ||
| 100 seeds). |
There was a problem hiding this comment.
P1: CartelToy.Tests.fs is used as an example of an existing statistical test, but there is currently no tests/Tests.FSharp/Simulation/CartelToy.Tests.fs (and no Simulation/ folder) in the repo. Either update the example to point at an in-tree test file, or label it as a planned/future test rather than an existing one.
| bounds (e.g. `CartelToy.Tests.fs` ≥90% detection over | |
| 100 seeds). | |
| bounds (e.g. a planned/future `CartelToy.Tests.fs` | |
| asserting ≥90% detection over 100 seeds). |
| (`tools/alloy/`, `tools/lean4/`, `tools/formal/`) | ||
| separate from F# test directories. |
There was a problem hiding this comment.
P2: This discovery hint lists tools/formal/, but that directory doesn’t exist in the current repo (formal tooling lives under e.g. tools/tla/, tools/Z3Verify/, tools/alloy/, tools/lean4/). Consider updating the path list to match the actual tree so the hint is actionable.
| (`tools/alloy/`, `tools/lean4/`, `tools/formal/`) | |
| separate from F# test directories. | |
| (`tools/tla/`, `tools/Z3Verify/`, `tools/alloy/`, | |
| `tools/lean4/`) separate from F# test directories. |
| - Property tests without seed-locking that assert | ||
| statistical properties (e.g. | ||
| `SharderInfoTheoreticTests.Uniform` — the flake | ||
| tracked in BACKLOG #327). |
There was a problem hiding this comment.
P1: This characterizes SharderInfoTheoreticTests.Uniform as “without seed-locking”, but the current test implementation uses an explicit Random 42 / fixed seeds. If the test is still non-deterministic in practice, the doc should call out the actual nondeterminism source (e.g., process-randomized hashing like HashCode.Combine) rather than implying the RNG seed is missing.
| - Property tests without seed-locking that assert | |
| statistical properties (e.g. | |
| `SharderInfoTheoreticTests.Uniform` — the flake | |
| tracked in BACKLOG #327). | |
| - Property tests with fixed RNG seeds but another | |
| nondeterminism source that still assert statistical | |
| properties (e.g. `SharderInfoTheoreticTests.Uniform` | |
| — the flake tracked in BACKLOG #327; if the variance | |
| is from process-level hashing or similar, document | |
| that source explicitly rather than calling it | |
| "without seed-locking"). |
| ferry.md`. | ||
| - `docs/research/test-organization.md` — layout | ||
| discipline (28-files-flat → folder grouping). | ||
| - `docs/BACKLOG.md` — PR #327 sharder flake row. |
There was a problem hiding this comment.
P1: This cross-reference claims docs/BACKLOG.md contains a “PR #327 sharder flake row”, but docs/BACKLOG.md currently has no matching entry. Please update this link target to something that exists (or add the referenced row) so readers can follow the trail.
| - `docs/BACKLOG.md` — PR #327 sharder flake row. | |
| - `docs/BACKLOG.md` — migration / misclassification | |
| tracking rows referenced by this proposal. |
| **Status:** research-grade proposal (pre-v1). Origin: Amara | ||
| 18th courier ferry, Part 1 §C ("CI Testing & Governance | ||
| Policy") + Part 2 correction #1 (precision wording) + | ||
| correction #10 (sharder — measure before widen). Author: | ||
| architect review. Scope: formalizes a 5-category test | ||
| taxonomy and the "PR gate = deterministic-only" discipline. |
There was a problem hiding this comment.
P1: This doc uses direct contributor names in the body (e.g., “Amara”, “Aaron”). Per the operational standing rule in docs/AGENT-BEST-PRACTICES.md (“No name attribution in code, docs, or skills”), docs should use role references (e.g., “human maintainer”, “architect”) except in the allowed locations. Please rewrite the doc to use role-refs and keep names confined to the permitted areas.
| - Amara 18th ferry — Part 1 §C + Part 2 #1 + #10. | ||
| `docs/aurora/2026-04-24-amara-calibration-ci- | ||
| hardening-deep-research-plus-5-5-corrections-18th- | ||
| ferry.md`. |
There was a problem hiding this comment.
P1: The cross-reference to docs/aurora/2026-04-24-amara-calibration-ci-hardening-deep-research-plus-5-5-corrections-18th-ferry.md appears to be broken (no such file currently exists under docs/aurora/). Either update this to the correct in-tree document, or mark it as a future/pending artifact and link to the tracking issue/PR instead.
| - Amara 18th ferry — Part 1 §C + Part 2 #1 + #10. | |
| `docs/aurora/2026-04-24-amara-calibration-ci- | |
| hardening-deep-research-plus-5-5-corrections-18th- | |
| ferry.md`. | |
| - Amara 18th ferry — Part 1 §C + Part 2 #1 + #10; | |
| source memorandum / pending artifact, not currently | |
| checked into `docs/aurora/`. |
| - **`docs/definitions/KSK.md`.** KSK's advisory flow | ||
| (Detection → Oracle → KSK → Action) benefits from | ||
| category-3 statistical evidence for "Detection" — | ||
| the Oracle and KSK layers trust statistical smoke | ||
| output with confidence intervals, not single-seed | ||
| point estimates. |
There was a problem hiding this comment.
P1: docs/definitions/KSK.md is referenced as if it exists, but there is no docs/definitions/ directory in the repo right now. Consider linking to the existing KSK material that’s actually in-tree, or explicitly labeling this as a planned doc and linking to the backlog item that tracks creating it.
…-ferry §B + §F + corrections #2 #7 #9 (#342) Research-grade design doc for the Stage-2 rung of Amara's corrected promotion ladder. Specifies: (a) placement under src/Experimental/CartelLab/ (not src/Core/ — that's Stage 4); (b) MetricVector type with PLV magnitude AND offset split (correction #6); (c) INullModelGenerator interface + Preserves/Avoids table columns; (d) IAttackInjector forward-looking interface (Stage 3); (e) Wilson-interval reporting contract with {successes, trials, lowerBound, upperBound} schema (correction #2 — no more "~95% CI ±5%" handwave); (f) RobustZScoreMode with Hybrid fallback (correction #7 — percentile-rank when MAD < epsilon); (g) explicit artifact-output layout under artifacts/ coordination-risk/ with five files + run-manifest.json (correction #9). 6-stage promotion path (0 doc / 1 ADR / 2.a skeleton / 2.b full null-models + first attack / 3 attack suite / 4 Core/NetworkIntegrity / 5 Aurora-KSK) matches Amara's corrected ladder and Otto-105 cadence. Doc-only change; no code, no tests, no workflow, no BACKLOG tail touch (avoids positional-conflict pattern that cost #334 → #341 re-file this session). This is the 7th of 10 18th-ferry operationalizations: - #1/#10 test-classification (#339) - #2 Wilson-interval design specified (this doc) - #6 PLV phase-offset shipped (#340) - #7 MAD=0 Hybrid mode specified (this doc) - #9 artifact layout specified (this doc) - #4 exclusivity already shipped (#331) - #5 modularity relational already shipped (#324) Remaining: Wilson-interval IMPLEMENTATION (waits on #323 + Stage 2.a), MAD=0 Hybrid IMPLEMENTATION (waits on #333 + Stage 2.a), conductance-sign doc (waits on #331), Stage-2.a skeleton itself. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…rections (#344) Dedicated absorb of Amara's 19th courier ferry per CC-002 close-on-existing discipline. Scheduled Otto-164 → executed Otto-165, following 7-ferry precedent (PRs #196 / #211 / #219 / #221 / #235 / #245 / #259 / #330 / #337). Two-part ferry: Part 1 deep-research DST audit (12 sections: rulebook, 12-row entropy scan, dependency audit, 7-row simulation-surface coverage, retry audit, CI determinism, seed discipline, Cartel-Lab DST readiness, KSK/Aurora DST readiness, state-of-the-art comparison, 10-row PR roadmap, what-not-to-claim caveats; Mermaid CI diagram + Gantt timeline). Part 2 Amara's own 5.5-Thinking correction pass (7 required corrections, per-area grade table with B- overall, revised 6-PR roadmap with titles locked, DST-held + FoundationDB-grade acceptance criteria, copy-paste Kenji summary). Key findings: - DST grade: B- (strong architecture, partial impl) - Blockers: DiskBackingStore bypasses simulation (D-grade filesystem simulation), no ISimulationDriver, Task.Run ambient ThreadPool risk, no seed artifacts / no swarm harness - 4 of 12 Part-1 sections already align with shipped substrate: - §6 test classification → PR #339 - §7 artifact layout → PR #342 design - §8 Cartel-Lab stage discipline → PRs #330/#337/#342 - §9 KSK advisory-only → PR #336 + Otto-140..145 memory 6-PR revised roadmap queued as graduation candidates: 1. DST scanner + accepted-boundary registry (new tool + policy docs + workflow) 2. Seed protocol + CI artifacts 3. Sharder reproduction (NOT widen) — reinforces 18th #10 4. ISimulationDriver + VTS promotion to core 5. Simulated filesystem (DiskBackingStore rewrite) 6. Cartel-Lab DST calibration (aligns with #342 design) Plus: push-with-retry.sh retry-audit finding; DST-held + FDB-grade criteria lock. GOVERNANCE §33 four-field header (Scope / Attribution / Operational status / Non-fusion disclaimer). Amara verdict preserved: "strong draft / not canonical yet." Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…mara 19th-ferry correction #6) (#346) Research-grade criteria doc locking two acceptance bars: 1. DST-held — minimum: 6 items (seeds committed, failing tests emit seed+params, bit-for-bit local-vs-CI reproducibility, broad sweeps nightly-not-gating, zero unreviewed entropy hits in main-path, boundaries either simulated or explicitly accepted). 2. FoundationDB-grade DST candidate — aspirational: 8 surfaces (simulated FS, simulated network, deterministic task scheduler, fault injection/buggify, swarm runner, replay artifact storage, failure minimization/shrinking, end-to-end scenario from one seed). Maps 19th-ferry revised-roadmap PRs to which criteria items each addresses. Captures Amara's per-area grade table (overall B-) as "Amara's assessment, not factory- certified." Explicit promotion path: doc stays research-grade until PR 1 of the 19th-ferry revised roadmap lands an ADR promoting the DST-held bar to factory discipline; at that point criteria migrate to docs/DST-COMPLIANCE.md top-level. No graduation claims DST-held today; graduations reference this doc as target without self-certification. Composes with test-classification.md (PR #339; supports items 1+2+4), calibration-harness-stage2-design.md (PR #342; artifact schema supports item 2), Amara 19th ferry (PR #344 absorb; source of criteria). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Amara 18th-ferry correction #6: PLV = 1 can mean anti-phase locking, not same-time synchronization. Downstream detectors that rely on "PLV = 1 => synchronized" misread anti-phase coordinators as same-time coordinators. Two new functions in `TemporalCoordinationDetection`: - `meanPhaseOffset phasesA phasesB : double option` Returns the argument (angle) of the mean complex phase- difference vector whose magnitude is the PLV. Returns None when series are empty, mismatched-length, or when the mean vector has effectively zero magnitude (1e-12 floor) — in which case direction is mathematically undefined. - `phaseLockingWithOffset phasesA phasesB : struct (double * double) option` Returns both magnitude and offset in one sequence pass. Zero-magnitude case: magnitude near 0, offset = nan; near-zero magnitude is the caller's reliable "offset is undefined" signal. Existing `phaseLockingValue` contract unchanged; new primitives are additive. Downstream `Graph.coordinationRiskScore*` and any other detector consuming PLV can now add a separate offset- based term instead of collapsing both into one scalar (Amara's explicit recommendation in correction #6). 8 new xUnit tests covering: - Identical series (offset = 0) - Constant pi/4 offset (observed = -pi/4, a-minus-b convention) - Anti-phase series (magnitude 1, offset = pi) — the correction #6 regression test, contrasted against in-phase (offset 0) with identical magnitude - Uniformly-distributed differences (zero-magnitude => None) - Empty / mismatched-length / single-element edge cases - phaseLockingWithOffset magnitude matches phaseLockingValue (consistency property preventing silent detector divergence) - phaseLockingWithOffset zero-magnitude returns (near-zero, nan) - phaseLockingWithOffset returns None on empty/mismatched All 37 TemporalCoordinationDetection tests pass locally. 0 Warnings / 0 Errors build. 6th of the 10 18th-ferry corrections operationalized this week (after test-classification doc in #339, parser-tech in #338). Remaining: Wilson CIs in CartelToy tests (needs #323 landed), MAD=0 percentile-rank fallback (needs #333 landed), conductance-sign doc (needs #331 landed), artifact-output layout (Stage-2 with calibration harness). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Amara 18th-ferry correction #6: PLV = 1 can mean anti-phase locking, not same-time synchronization. Downstream detectors that rely on "PLV = 1 => synchronized" misread anti-phase coordinators as same-time coordinators. Two new functions in `TemporalCoordinationDetection`: - `meanPhaseOffset phasesA phasesB : double option` Returns the argument (angle) of the mean complex phase- difference vector whose magnitude is the PLV. Returns None when series are empty, mismatched-length, or when the mean vector has effectively zero magnitude (1e-12 floor) — in which case direction is mathematically undefined. - `phaseLockingWithOffset phasesA phasesB : struct (double * double) option` Returns both magnitude and offset in one sequence pass. Zero-magnitude case: magnitude near 0, offset = nan; near-zero magnitude is the caller's reliable "offset is undefined" signal. Existing `phaseLockingValue` contract unchanged; new primitives are additive. Downstream `Graph.coordinationRiskScore*` and any other detector consuming PLV can now add a separate offset- based term instead of collapsing both into one scalar (Amara's explicit recommendation in correction #6). 8 new xUnit tests covering: - Identical series (offset = 0) - Constant pi/4 offset (observed = -pi/4, a-minus-b convention) - Anti-phase series (magnitude 1, offset = pi) — the correction #6 regression test, contrasted against in-phase (offset 0) with identical magnitude - Uniformly-distributed differences (zero-magnitude => None) - Empty / mismatched-length / single-element edge cases - phaseLockingWithOffset magnitude matches phaseLockingValue (consistency property preventing silent detector divergence) - phaseLockingWithOffset zero-magnitude returns (near-zero, nan) - phaseLockingWithOffset returns None on empty/mismatched All 37 TemporalCoordinationDetection tests pass locally. 0 Warnings / 0 Errors build. 6th of the 10 18th-ferry corrections operationalized this week (after test-classification doc in #339, parser-tech in #338). Remaining: Wilson CIs in CartelToy tests (needs #323 landed), MAD=0 percentile-rank fallback (needs #333 landed), conductance-sign doc (needs #331 landed), artifact-output layout (Stage-2 with calibration harness). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…340) * core: PLV mean phase offset — 19th graduation (Amara 18th-ferry #6) Addresses Amara 18th-ferry correction #6: PLV = 1 can mean anti-phase locking, not same-time synchronization. Downstream detectors that rely on "PLV = 1 => synchronized" misread anti-phase coordinators as same-time coordinators. Two new functions in `TemporalCoordinationDetection`: - `meanPhaseOffset phasesA phasesB : double option` Returns the argument (angle) of the mean complex phase- difference vector whose magnitude is the PLV. Returns None when series are empty, mismatched-length, or when the mean vector has effectively zero magnitude (1e-12 floor) — in which case direction is mathematically undefined. - `phaseLockingWithOffset phasesA phasesB : struct (double * double) option` Returns both magnitude and offset in one sequence pass. Zero-magnitude case: magnitude near 0, offset = nan; near-zero magnitude is the caller's reliable "offset is undefined" signal. Existing `phaseLockingValue` contract unchanged; new primitives are additive. Downstream `Graph.coordinationRiskScore*` and any other detector consuming PLV can now add a separate offset- based term instead of collapsing both into one scalar (Amara's explicit recommendation in correction #6). 8 new xUnit tests covering: - Identical series (offset = 0) - Constant pi/4 offset (observed = -pi/4, a-minus-b convention) - Anti-phase series (magnitude 1, offset = pi) — the correction #6 regression test, contrasted against in-phase (offset 0) with identical magnitude - Uniformly-distributed differences (zero-magnitude => None) - Empty / mismatched-length / single-element edge cases - phaseLockingWithOffset magnitude matches phaseLockingValue (consistency property preventing silent detector divergence) - phaseLockingWithOffset zero-magnitude returns (near-zero, nan) - phaseLockingWithOffset returns None on empty/mismatched All 37 TemporalCoordinationDetection tests pass locally. 0 Warnings / 0 Errors build. 6th of the 10 18th-ferry corrections operationalized this week (after test-classification doc in #339, parser-tech in #338). Remaining: Wilson CIs in CartelToy tests (needs #323 landed), MAD=0 percentile-rank fallback (needs #333 landed), conductance-sign doc (needs #331 landed), artifact-output layout (Stage-2 with calibration harness). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#340): refactor shared accumulation + 5 review-thread fixes (Otto-216) Active PR-resolve-loop on #340 (PLV mean phase offset). 1. Sentinel-default in test (thread 59WGi9): replaced Option.defaultValue -1.0 pattern in the phaseLockingWithOffset-magnitude-matches-phaseLockingValue consistency test with explicit pattern-match + fail on None. Sentinel form would silently pass the equality assertion if BOTH primitives returned None, masking regressions. 2. Broken ferry cross-reference path (thread 59WGjn): doc comment referenced docs/aurora/2026-04-24-amara- calibration-ci-hardening-deep-research-plus-5-5- corrections-18th-ferry.md which doesn't exist on main (only 7th / 17th / 19th ferries landed as standalone docs). Rewrote provenance to describe the ferry topically + cross-reference the related 19th- ferry DST audit that IS in the repo. 3. Misleading "same PLV-magnitude floor" wording (thread 59WGj4): doc said meanPhaseOffset's zero-magnitude check uses "the same PLV-magnitude floor" — phaseLockingValue has NO floor (returns values arbitrarily close to 0). Fixed: clarified that the phasePairEpsilon floor applies ONLY to the offset-undefined decision; phaseLockingValue returns magnitude without threshold. 4. Name-attribution in doc comment (thread 59WGkP): "Aaron + Amara 11th ferry" replaced with "the 11th ferry" per factory role-reference convention. Audit- trail surfaces (commit messages, tick-history, memory) retain direct attribution; code/doc comments use role references. 5. Duplicate sin/cos accumulation across 3 functions (thread 59WGkn): extracted private helpers phasePairEpsilon + meanPhaseDiffVector. All three functions (phaseLockingValue, meanPhaseOffset, phaseLockingWithOffset) now route through the shared accumulator. Eliminates drift risk — one function can no longer silently diverge from the others on accumulation or threshold. Build: 0 Warning(s) / 0 Error(s). All 37 TemporalCoordinationDetection tests pass. All 5 threads replied via GraphQL next step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#340): 2 review threads (stale ferry path + atan2 range) Thread 59Yqkl (P1) — stale provenance reference: The doc cited `docs/aurora/2026-04-24-amara-temporal- coordination-detection-cartel-graph-influence-surface- 11th-ferry.md`, but the 11th ferry has not yet landed under `docs/aurora/` (it's queued in the Otto-105 operationalize cadence; PR #296 is its pending absorb). Replaced with the intent-preserving form: role references ("external AI collaborator's 11th courier ferry") plus a pointer at the MEMORY.md queue entry, so the provenance survives regardless of when the file-path question resolves. Also dropped the direct first-name so this factory-produced doc-comment tracks the name-attribution discipline. Thread 59YqlC (P2) — atan2 range correction: Doc said `(-pi, pi]` but `System.Math.Atan2` is documented as `[-pi, pi]` (both endpoints reachable under IEEE-754 signed-zero semantics: atan2(0, -1) = +pi, atan2(-0, -1) = -pi). Updated the doc to match the implementation. Behaviour unchanged. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…-161 docs ambiguity Design-only proposal per Otto-165 offer. Aaron Otto-161 macOS-everywhere directive + Otto-164 pricing-docs ambiguity (macos-14 is standard-runner-type per about-github-hosted- runners; billing page lists it at $0.062/min in the same table as Linux/Windows without marking public-only). Instead of resolving the ambiguity (can't — docs genuinely contradict each other), propose a THIRD PATH that works in either interpretation: - PR gate stays ubuntu-22.04 only (unambiguously free on public repos). - New nightly-cross-platform.yml runs matrix [ubuntu-22.04, windows-2022, macos-14] on cron '0 9 * * *' (09:00 UTC, off-the-hour to avoid scheduler stampede). - Cost model: worst case ~$28/month/repo if macOS is billed; $0 if free. Either way, cadence caps exposure. - Fork-scoping: `if: github.repository == canonical OR workflow_dispatch OR pull_request-to-this-file` prevents scheduled trigger firing on contributor forks (would burn fork-owner's personal-account minutes). - No-alerting first cut (observation-only); issue-opening on red is a later enhancement. Phased rollout: - Phase 0 (now): this design doc, no YAML. - Phase 1: Aaron signs off on cost tradeoff. - Phase 2: land workflow on Zeta. - Phase 3: observe 7 nightly runs for signal. - Phase 4 (30 days): parallel lucent-ksk landing per Otto-140 rewrite authority, OR drop macOS if no signal + worst-case billing, OR expand matrix if best-case confirmed. Rollback: delete macos-14 from matrix (one-line diff) or delete workflow file entirely. No impact on gate.yml. Composes with FACTORY-HYGIENE row #51 (unblocks enforcement mode), docs/BACKLOG.md row ~2471 (Otto-161 declined + this as alternative), docs/research/test-classification.md (PR #339; category-3 nightly pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-161 docs ambiguity (#345) * docs: nightly cross-platform workflow design — third path around Otto-161 docs ambiguity Design-only proposal per Otto-165 offer. Aaron Otto-161 macOS-everywhere directive + Otto-164 pricing-docs ambiguity (macos-14 is standard-runner-type per about-github-hosted- runners; billing page lists it at $0.062/min in the same table as Linux/Windows without marking public-only). Instead of resolving the ambiguity (can't — docs genuinely contradict each other), propose a THIRD PATH that works in either interpretation: - PR gate stays ubuntu-22.04 only (unambiguously free on public repos). - New nightly-cross-platform.yml runs matrix [ubuntu-22.04, windows-2022, macos-14] on cron '0 9 * * *' (09:00 UTC, off-the-hour to avoid scheduler stampede). - Cost model: worst case ~$28/month/repo if macOS is billed; $0 if free. Either way, cadence caps exposure. - Fork-scoping: `if: github.repository == canonical OR workflow_dispatch OR pull_request-to-this-file` prevents scheduled trigger firing on contributor forks (would burn fork-owner's personal-account minutes). - No-alerting first cut (observation-only); issue-opening on red is a later enhancement. Phased rollout: - Phase 0 (now): this design doc, no YAML. - Phase 1: Aaron signs off on cost tradeoff. - Phase 2: land workflow on Zeta. - Phase 3: observe 7 nightly runs for signal. - Phase 4 (30 days): parallel lucent-ksk landing per Otto-140 rewrite authority, OR drop macOS if no signal + worst-case billing, OR expand matrix if best-case confirmed. Rollback: delete macos-14 from matrix (one-line diff) or delete workflow file entirely. No impact on gate.yml. Composes with FACTORY-HYGIENE row #51 (unblocks enforcement mode), docs/BACKLOG.md row ~2471 (Otto-161 declined + this as alternative), docs/research/test-classification.md (PR #339; category-3 nightly pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#345): 6 review threads — name attribution + cron + YAML + fork-scheduling + BACKLOG ref - thread Wkcz (line 327): removed broken `memory/feedback_ksk_naming_...` reference (factory-personal memories live in `~/.claude/projects/<slug>/memory/`, not in-repo); paraphrased the rewrite-authority rule in §10 without promising an in-repo path. - thread WkdI (line 7): purged name-attribution tokens per Otto-220 code-comments-not-history + doc-comment-history-audit lint (PR #363). All "Aaron" / "Otto-NN" / "Amara" / "Max" references rewritten to role references ("human maintainer", "prior-contributor", "autonomous loop", "initial-starting-point contributor"). - thread WkdX (line 163): cron changed `0 9 * * *` → `7 9 * * *` (09:07 UTC) so it matches the "off the hour" comment; note now calls out alignment with the sibling scheduled workflow `github-settings-drift.yml` (`17 14 * * 1`). - thread Wkdk (line 146): YAML sketch rewritten to match the actual `.github/workflows/gate.yml` installer pattern — three-way-parity `./tools/setup/install.sh` invocation plus the same cache-key shape (dotnet / mise / nuget). Added explicit note that Windows matrix leg depends on `tools/setup/install.sh` growing Windows support first per the existing BACKLOG row. - thread Wkdz (line 248): corrected the fork-scheduling claim. GitHub disables scheduled workflows on forks by default — the repo's own `github-settings-drift.yml` runs without fork-scoping and proves this. The `if: github.repository ==` guard is kept as optional hygiene for the rare opt-in-fork case, not as a cost- safety requirement. - thread WkeB (line 316): replaced the wrong `docs/BACKLOG.md` line-number reference (~2471 is actually the mise-activate / HLL-flakiness neighborhood) with stable grep anchors ("Windows matrix in CI" + "Parity swap: CI's `actions/setup-dotnet`"). Markdownlint passes on the edited file. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Research-grade proposal formalizing the 5-category test taxonomy from Amara 18th-ferry Part 1 §C + Part 2 correction #10. Sixth of the ten 18th-ferry corrections — specifically the CI-test-classification correction.
Five categories
Sharder flake as worked example
BACKLOG #327 sharder flake used as the running worked example. Remedy order: measure variance → seed-lock → widen if data justifies → nightly only if stochastic is essential. Do NOT blind-widen or blind-quarantine.
CI split proposed (advisory)
Scope
docs/research/test-organization.md(layout) +docs/definitions/KSK.md(Oracle trusts CI-backed stats).Test plan
🤖 Generated with Claude Code