-
Notifications
You must be signed in to change notification settings - Fork 1
core: RobustStats — first Amara-graduation (10th-ferry median+MAD robustAggregate) #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,102 @@ | ||||||
| namespace Zeta.Core | ||||||
|
|
||||||
| open System | ||||||
|
|
||||||
|
|
||||||
| /// **Robust statistical aggregation** — median plus median-absolute- | ||||||
| /// deviation (MAD) with an outlier filter. The canonical operational | ||||||
| /// shape for numeric-oracle aggregation proposed in Amara's 10th | ||||||
| /// courier ferry (`docs/aurora/2026-04-23-amara-aurora-deep-research- | ||||||
| /// report-10th-ferry.md`) — first graduation from the Amara- | ||||||
| /// absorb-to-ship cadence (see the Otto-105 feedback memory | ||||||
| /// `feedback_amara_contributions_must_operationalize_*_2026-04-24`). | ||||||
| /// | ||||||
| /// **Why this shape** — the arithmetic mean inherits everything bad | ||||||
| /// about every sample, including the ones that are wrong. The | ||||||
| /// median survives half its inputs being adversarial. MAD is to the | ||||||
| /// median what standard deviation is to the mean: a scale estimate | ||||||
| /// that also survives outliers. The 3-sigma-equivalent filter | ||||||
| /// (`|x - median| <= 3 * max(MAD, epsilon)`) is the classical robust- | ||||||
| /// aggregation move; `epsilon` is a degenerate-input floor that | ||||||
| /// stops the filter from collapsing to "median only" when the | ||||||
| /// sample is perfectly uniform and MAD = 0. | ||||||
| /// | ||||||
| /// **Relation to Zeta substrate** — this is a pure-function helper | ||||||
| /// for downstream oracle / bullshit-detector / reputation-aggregation | ||||||
| /// code; it does not depend on the Z-set algebra or the operator | ||||||
|
Comment on lines
+24
to
+26
|
||||||
| /// graph and does not need a streaming/incremental variant at this | ||||||
| /// scale. If incremental-median is needed later, that's a separate | ||||||
| /// module (t-digest / p-squared / HdrHistogram territory). | ||||||
| /// | ||||||
| /// **Anti-consensus framing** — the implementation follows Amara's | ||||||
| /// explicit rationale: *"agreement alone is not proof; what matters | ||||||
| /// is independent, bounded, falsifiable convergence."* The robust | ||||||
| /// aggregate reduces one mechanical failure mode — "a few loud | ||||||
| /// outliers pull the mean" — without claiming it resolves | ||||||
| /// independence-of-sources (that's `antiConsensusGate` territory, | ||||||
| /// a separate graduation). | ||||||
| [<AutoOpen>] | ||||||
|
||||||
| [<AutoOpen>] | |
| [<RequireQualifiedAccess>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compute even-length median without overflow
The even-length branch in median uses (arr.[n / 2 - 1] + arr.[n / 2]) / 2.0, which overflows for large but finite same-sign inputs (for example, [1e308; 1e308] yields Some Infinity instead of a finite median). Because mad and robustAggregate reuse median, this can propagate into incorrect outlier thresholds and aggregates; use an overflow-safe midpoint formula like a + (b - a) / 2.0.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve non-empty semantics when samples contain NaN
robustAggregate documents None for empty input, but with non-empty data containing enough NaN values (e.g., [NaN; NaN; 1.0]), m becomes NaN, every abs (x - m) <= threshold check is false, and median kept returns None. That silently turns “invalid numeric samples” into “no samples,” which can cause downstream logic to skip updates instead of handling bad input explicitly.
Useful? React with 👍 / 👎.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| module Zeta.Tests.Algebra.RobustStatsTests | ||
|
|
||
| open FsUnit.Xunit | ||
| open global.Xunit | ||
| open Zeta.Core | ||
|
|
||
|
|
||
| // ─── Core: median on odd / even / empty ───────── | ||
|
|
||
| [<Fact>] | ||
| let ``median of empty sequence is None`` () = | ||
| RobustStats.median [] |> should equal (None: double option) | ||
|
|
||
| [<Fact>] | ||
| let ``median of single element returns that element`` () = | ||
| RobustStats.median [ 42.0 ] |> should equal (Some 42.0) | ||
|
|
||
| [<Fact>] | ||
| let ``median of odd-length sample picks middle element after sort`` () = | ||
| RobustStats.median [ 3.0; 1.0; 2.0 ] |> should equal (Some 2.0) | ||
|
|
||
| [<Fact>] | ||
| let ``median of even-length sample averages two centre elements`` () = | ||
| RobustStats.median [ 4.0; 2.0; 1.0; 3.0 ] |> should equal (Some 2.5) | ||
|
|
||
|
|
||
| // ─── MAD properties ───────── | ||
|
|
||
| [<Fact>] | ||
| let ``mad of empty sequence is None`` () = | ||
| RobustStats.mad [] |> should equal (None: double option) | ||
|
|
||
| [<Fact>] | ||
| let ``mad of constant sample is zero`` () = | ||
| RobustStats.mad [ 5.0; 5.0; 5.0; 5.0 ] |> should equal (Some 0.0) | ||
|
|
||
| [<Fact>] | ||
| let ``mad of 1 2 3 4 5 equals 1`` () = | ||
| // median = 3, deviations = 2,1,0,1,2, median of devs = 1. | ||
| RobustStats.mad [ 1.0; 2.0; 3.0; 4.0; 5.0 ] |> should equal (Some 1.0) | ||
|
|
||
|
|
||
| // ─── robustAggregate: the load-bearing behaviour ───────── | ||
|
|
||
| [<Fact>] | ||
| let ``robustAggregate of empty sequence is None`` () = | ||
| RobustStats.robustAggregate [] |> should equal (None: double option) | ||
|
|
||
| [<Fact>] | ||
| let ``robustAggregate of single element returns that element`` () = | ||
| RobustStats.robustAggregate [ 7.0 ] |> should equal (Some 7.0) | ||
|
|
||
| [<Fact>] | ||
| let ``robustAggregate of constant sample returns the constant`` () = | ||
| // MAD = 0 here; MadFloor prevents the filter from collapsing. | ||
| RobustStats.robustAggregate [ 5.0; 5.0; 5.0; 5.0; 5.0 ] |> should equal (Some 5.0) | ||
|
|
||
| [<Fact>] | ||
| let ``robustAggregate survives a single extreme outlier`` () = | ||
| // The mean of [1;2;3;4;5;1000] is 169.2 — a single adversarial | ||
| // sample has moved the answer beyond any legitimate reading. The | ||
| // robust aggregate discards the outlier and returns the median | ||
| // of the kept set. | ||
| let xs = [ 1.0; 2.0; 3.0; 4.0; 5.0; 1000.0 ] | ||
| let result = RobustStats.robustAggregate xs | ||
| // median = 3.5; MAD ≈ 1.5; threshold = 4.5; 1000 is dropped; | ||
| // kept = [1;2;3;4;5]; median of kept = 3. | ||
| result |> should equal (Some 3.0) | ||
|
|
||
| [<Fact>] | ||
| let ``robustAggregate keeps values within three MAD of the median`` () = | ||
| // median = 3, MAD = 1, threshold = 3. Values 1..5 all satisfy | ||
| // |x - 3| <= 3; no outlier to drop. Kept-median = 3. | ||
| RobustStats.robustAggregate [ 1.0; 2.0; 3.0; 4.0; 5.0 ] |> should equal (Some 3.0) | ||
|
|
||
| [<Fact>] | ||
| let ``robustAggregate is unaffected by adding a mirrored outlier pair`` () = | ||
| // Symmetric extreme pair on both sides of the sample. | ||
| let baseline = RobustStats.robustAggregate [ 1.0; 2.0; 3.0; 4.0; 5.0 ] | ||
| let withOutliers = RobustStats.robustAggregate [ -1000.0; 1.0; 2.0; 3.0; 4.0; 5.0; 1000.0 ] | ||
| withOutliers |> should equal baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: The XML doc comment references
docs/aurora/2026-04-23-amara-aurora-deep-research-report-10th-ferry.mdandfeedback_amara_contributions_must_operationalize_*_2026-04-24, but those paths don’t exist in the repo (and the doc path is also split across lines with a trailing hyphen, making it uncopyable). Please update to the actual in-repo document/memory filenames (or remove the references) and keep file paths on a single line (e.g., inside<c>...</c>).