-
Notifications
You must be signed in to change notification settings - Fork 1
fix(audit): comprehensive HashCode.Combine sweep + SplitMix64 refactor (Otto-281 follow-up) #479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,148 @@ | ||
| --- | ||
| name: DST-EXEMPT IS DEFERRED-BUG — never a safe state, no comment that says "DST-exempt" should ship without a deadline; the SharderInfoTheoreticTests case: a previous tick marked the test "DST-exempt: uses HashCode.Combine which is process-randomized" + filed a BACKLOG row, treated as containment; in reality the exemption masked the determinism violation which compounded into 3 unrelated flakes this session (#454 / #458 / #473) before Aaron 2026-04-25 surfaced the rule "see how that one DST exception caused the flake, this is why DST is so important, when we violate, we introduce random failures"; ship the FIX (Otto-281: detHash via XxHash3), not the EXEMPTION; counterweight to Otto-272 DST-everywhere | ||
| description: Otto-281 counterweight memory. DST exemptions are not containment; they are deferred bugs. The SharderInfoTheoreticTests "DST-exempt" comment masked a flake that fired 3 times on unrelated PRs before getting fixed. Pattern: when tempted to write "DST-exempt", instead either fix the determinism or delete the test. Never ship a long-lived DST-exempt tag. | ||
| type: feedback | ||
| --- | ||
| ## The rule | ||
|
|
||
| **"DST-exempt" is not a safe state. It is a deferred bug.** | ||
|
|
||
| When a test or code path is marked "DST-exempt" with a comment | ||
| that says "uses X which is process-randomized" or "depends on | ||
| environment Y", that is NOT containment. It is *masking* the | ||
| determinism violation behind a label that sounds like an escape | ||
| hatch. | ||
|
|
||
| Aaron's verbatim framing 2026-04-25: | ||
|
|
||
| > *"see how that one DST exception caused the fake [flake], | ||
| > this is why DST is so important, when we violate, we | ||
| > introduce random failures."* | ||
|
|
||
| ## The case that triggered the rule | ||
|
|
||
| `tests/Tests.FSharp/Formal/Sharder.InfoTheoretic.Tests.fs` | ||
| contained three uses of `HashCode.Combine k` for hashing | ||
| integer keys: | ||
|
|
||
| ```fsharp | ||
| let h = uint64 (HashCode.Combine k) | ||
| let s = JumpConsistentHash.Pick(h, shards) | ||
| ``` | ||
|
|
||
| `System.HashCode.Combine` is **process-randomized by .NET | ||
| design** to deter hash-flooding attacks on dictionaries. So | ||
| the same int produces different hashes in different processes. | ||
| Jump-consistent-hash output depends on the input hash, so | ||
| shard assignments varied across CI runs. The | ||
| `< 1.2 max/avg ratio` assertion sometimes held, sometimes | ||
| didn't. | ||
|
|
||
| A previous tick (Aaron 2026-04-24 directive territory) added | ||
| this comment above one of the tests: | ||
|
|
||
| ```fsharp | ||
| // DST-exempt: uses `HashCode.Combine` which is process-randomized | ||
| // per-run. Flake analysis + fix pipeline tracked in docs/BACKLOG.md | ||
| // "SharderInfoTheoreticTests 'Uniform traffic' flake" row. DST | ||
| // marker convention + lint rule: docs/BACKLOG.md "DST-marker | ||
| // convention + lint rule" row (Aaron 2026-04-24 directive). | ||
| ``` | ||
|
|
||
| **That comment treated the issue as contained.** It identified | ||
| the cause (`HashCode.Combine` process-randomization), filed a | ||
| BACKLOG row, and added a "DST-marker convention" idea — all of | ||
| which are good housekeeping. But it left the determinism | ||
| violation in the test code. | ||
|
|
||
| ## What the masking cost | ||
|
|
||
| The "Uniform traffic: consistent-hash is already near-optimal" | ||
| test flaked **three times in this session** on unrelated PRs: | ||
|
|
||
| - **#454** (FSharp.Core 10.1.202 → 10.1.203) — pure dep bump, | ||
| failed on a probabilistic sharder test. | ||
| - **#458** (System.Numerics.Tensors 10.0.6 → 10.0.7) — same. | ||
| - **#473** (Dependabot grouping config) — yaml-only change, | ||
| same flake. | ||
|
|
||
| Three unrelated PRs each had to be diagnosed, the flake ruled | ||
| "unrelated to this change", and the job rerun. Three rerun | ||
| cycles of compute. Three opportunities for an autonomous-loop | ||
| agent to misidentify a real bug as "the same flake — rerun" | ||
| and ship a regression. Three eyeball-time costs for the | ||
| maintainer. | ||
|
|
||
| **That's the compounding cost of a DST exemption left to live.** | ||
| The exemption didn't contain the cost — it spread the cost | ||
| across N PRs and N reruns instead of concentrating it on one | ||
| fix. | ||
|
|
||
| ## The correct response patterns | ||
|
|
||
| **When you encounter a flake that uses non-deterministic | ||
| primitives:** | ||
|
|
||
| 1. **Fix the determinism.** Replace the non-deterministic | ||
| primitive with a deterministic one of the same kind. | ||
| For `HashCode.Combine` → `XxHash3.HashToUInt64` (Otto-281). | ||
| For `Random` (no seed) → `Random seed`. For `DateTime.UtcNow` | ||
| inside a property check → fixed `DateTimeOffset` constant. | ||
|
|
||
| 2. **If the determinism cannot be fixed, delete the test.** A | ||
| test that is *probabilistic in CI* is not a test — it's a | ||
| coin flip that gets logged. Either commit to fixing the | ||
| determinism or delete the test entirely. Don't ship a | ||
| dual-state "sometimes-pass-sometimes-fail" thing under a | ||
| "DST-exempt" label. | ||
|
|
||
| 3. **If the determinism cannot be fixed AND the test cannot be | ||
| deleted (e.g., it tests an inherently-stochastic property | ||
| like a Monte Carlo bound), wrap the entire test body in a | ||
| loop with a fixed seed and assert the *aggregate* property | ||
| over N runs.** That converts the stochastic property into a | ||
| deterministic-meta-property over fixed seeds. | ||
|
|
||
| **Never** leave the test in CI with a "DST-exempt" comment. The | ||
| comment doesn't make the determinism violation safe — it just | ||
| defers the cost. | ||
|
|
||
| ## The fix shape (Otto-281) | ||
|
|
||
| ```fsharp | ||
| open System.IO.Hashing | ||
|
|
||
| let private detHash (k: int) : uint64 = | ||
| XxHash3.HashToUInt64 (ReadOnlySpan (BitConverter.GetBytes k)) | ||
|
|
||
| // All three call sites changed: | ||
| // let h = uint64 (HashCode.Combine k) | ||
| // becomes: | ||
| // let h = detHash k | ||
| ``` | ||
|
|
||
| Three iterations in separate processes — all pass with | ||
| identical output. Determinism restored. PR #478. | ||
|
|
||
| ## Composes with | ||
|
|
||
| - **Otto-272** *DST-ify the stabilization process* — counterweight | ||
| discipline must be deterministic. Same energy: don't carve | ||
| out exceptions; fix the root. | ||
| - **Otto-248** *never ignore flakes* — flakes ARE the determinism | ||
| violation, not "transient infra noise". Same shape applied | ||
| to test-side code. | ||
| - **Otto-264** *rule of balance* — every found mistake triggers | ||
| a counterweight. This memory IS the counterweight to the | ||
| "DST-exempt" mistake. | ||
| - **GOVERNANCE.md §section on DST** — DST-everywhere as the | ||
| default mode, not the special mode. | ||
|
|
||
| ## Pre-commit-lint candidate | ||
|
|
||
| A grep for `DST-exempt` / `DST exempt` / `dst-exempt` in | ||
| comments inside `tests/**` should fire as a warning at | ||
| pre-commit time. Each occurrence is a deferred bug that | ||
| needs a deadline. The lint comment can include the DST | ||
| discipline reminder: "DST-exempt is not containment — fix or | ||
| delete; don't ship dual-state tests." | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -73,10 +73,8 @@ type RendezvousHash(bucketSeeds: uint64 array) = | |
| [<MethodImpl(MethodImplOptions.AggressiveInlining)>] | ||
| static member private Mix(a: uint64, b: uint64) : uint64 = | ||
| // SplitMix64 of (a xor b) — good enough for HRW scoring. | ||
| let mutable z = (a ^^^ b) * 0x9E3779B97F4A7C15UL | ||
| z <- (z ^^^ (z >>> 30)) * 0xBF58476D1CE4E5B9UL | ||
| z <- (z ^^^ (z >>> 27)) * 0x94D049BB133111EBUL | ||
| z ^^^ (z >>> 31) | ||
| // See `src/Core/SplitMix64.fs` for the constant rationale. | ||
| SplitMix64.mix (a ^^^ b) | ||
|
Comment on lines
73
to
+77
|
||
|
|
||
| /// Pick a bucket by maximum-score-wins. O(bucketCount). | ||
| member _.Pick(key: uint64) : int = | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,7 @@ namespace Zeta.Core | |
|
|
||
| open System | ||
| open System.Collections.Generic | ||
| open System.IO.Hashing | ||
| open System.Runtime.CompilerServices | ||
| open System.Runtime.InteropServices | ||
| open System.Threading | ||
|
|
@@ -51,6 +52,16 @@ type Shard = | |
| /// Shard a key using `HashCode.Combine` mixed with the per-process | ||
| /// salt. Good default for any ingest path that touches user-controlled | ||
| /// keys — rules out the HashDoS attack. | ||
| /// | ||
| /// **Intentionally non-deterministic across processes.** `HashCode.Combine` | ||
| /// is process-randomized by .NET design; combining with the per-process | ||
| /// `Shard.Salt` doubles down on that. Two processes will assign the same | ||
| /// key to different shards. That is the *point* — it's HashDoS defence: | ||
| /// an attacker cannot pre-compute a worst-case input set for our shard | ||
| /// distribution because they don't know our seed. | ||
| /// | ||
| /// If you need cross-process / cross-restart determinism, use | ||
| /// `OfFixed`, which uses `XxHash3` instead. | ||
| [<MethodImpl(MethodImplOptions.AggressiveInlining)>] | ||
| static member OfKey(key: 'K, shards: int) : int = | ||
| let h = uint32 (HashCode.Combine(key, Shard.Salt)) | ||
|
|
@@ -59,9 +70,51 @@ type Shard = | |
| /// Shard without the per-process salt — stable across restarts. | ||
| /// Prefer `OfKey` unless you *specifically* need cross-process | ||
| /// determinism (e.g. for Kafka key-to-partition consistency). | ||
| /// | ||
| /// **Determinism contract** (Otto-281 honesty audit): | ||
| /// | ||
| /// - For **value-type keys** (`int`, `int64`, `uint32`, `byte`, etc.): | ||
| /// fully deterministic across processes. `int.GetHashCode()` returns | ||
| /// `this` and the rest is deterministic mixing. | ||
| /// - For **string keys**: NOT deterministic across processes. | ||
| /// `string.GetHashCode()` is per-process-randomized in .NET Core+ | ||
| /// (anti-hash-flooding), and we cannot recover from that within a | ||
| /// generic `'K` API. String-key callers who need cross-process | ||
| /// consistency MUST hash their UTF-8 bytes themselves and call | ||
| /// `Shard.Of(uint32 hash, shards)` directly — for example with | ||
| /// `XxHash3.HashToUInt64(Encoding.UTF8.GetBytes(s))`. | ||
| /// - For **other reference types**: deterministic only if the type's | ||
| /// `GetHashCode()` is deterministic (most user-defined records are; | ||
| /// classes that depend on instance identity are not). | ||
| /// | ||
| /// Otto-281 fix replaced an earlier implementation that used | ||
| /// `HashCode.Combine key`, which is *also* process-randomized for | ||
| /// value types (because `HashCode.Combine`'s mixer is randomized | ||
| /// regardless of input). The new implementation is strictly better | ||
| /// for value types and no-worse for strings. | ||
| /// | ||
| /// String-key cross-process consistency is tracked as a separate | ||
| /// follow-up: typed overloads `OfFixedString(s: string, shards)` and | ||
| /// `OfFixedBytes(bytes: ReadOnlySpan<byte>, shards)`. | ||
|
Comment on lines
+96
to
+98
|
||
| [<MethodImpl(MethodImplOptions.AggressiveInlining)>] | ||
| static member OfFixed(key: 'K, shards: int) : int = | ||
| Shard.Of(uint32 (HashCode.Combine key), shards) | ||
| let intHash = key.GetHashCode() | ||
| let bytes = BitConverter.GetBytes intHash | ||
| let h64 = XxHash3.HashToUInt64 (ReadOnlySpan bytes) | ||
| Shard.Of(uint32 h64, shards) | ||
|
Comment on lines
+101
to
+104
|
||
|
|
||
| /// Shard a UTF-8 byte sequence without per-process salt — fully | ||
| /// deterministic across processes and machines. Use this for any | ||
| /// cross-process / cross-restart shard assignment where the key | ||
| /// has a canonical byte representation. | ||
| /// | ||
| /// String callers: `Shard.OfFixedBytes(Encoding.UTF8.GetBytes s, shards)` | ||
| /// is the cross-process-consistent shard for `s`. The plain | ||
| /// `Shard.OfFixed("...", shards)` is NOT (see `OfFixed` doc). | ||
| [<MethodImpl(MethodImplOptions.AggressiveInlining)>] | ||
| static member OfFixedBytes(bytes: ReadOnlySpan<byte>, shards: int) : int = | ||
| let h64 = XxHash3.HashToUInt64 bytes | ||
| Shard.Of(uint32 h64, shards) | ||
|
|
||
|
|
||
| /// Exchange operator — partitions a Z-set across `shards` sub-streams by | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -54,14 +54,26 @@ type HyperLogLog(logBuckets: int) = | |||||
| /// ≥ ~65 k the 32-bit floor begins to dominate; use `AddHash` | ||||||
| /// directly with a proper `XxHash3`/`XxHash64` on bytes if you need | ||||||
| /// billions-scale accuracy. | ||||||
| /// | ||||||
| /// **Process-randomization caveat (Otto-281 audit):** | ||||||
| /// `HashCode.Combine` re-seeds per-process by .NET design (anti- | ||||||
| /// hash-flooding). Two processes will produce different cardinality | ||||||
| /// estimates for the same input stream — the *bound* is correct | ||||||
| /// (within ~2% relative error per HLL's `alpha`), but the exact | ||||||
| /// estimate jitters. For tests requiring deterministic estimates | ||||||
| /// across runs, use `AddBytes` with a canonical byte representation; | ||||||
| /// see `tests/Tests.FSharp/Sketches/HyperLogLog.Tests.fs` for the | ||||||
| /// XxHash3 path. The jittery estimate is a *deliberate* trade-off | ||||||
| /// for hot-path performance: an earlier revision called | ||||||
| /// `XxHash3.HashToUInt64` on a 4-byte heap array per Add; for a 1 M | ||||||
| /// element stream that's 1 M Gen-0 allocations purely for HLL. | ||||||
| [<MethodImpl(MethodImplOptions.AggressiveInlining)>] | ||||||
| member this.Add(value: 'T) = | ||||||
| let h32 = HashCode.Combine value |> uint64 | ||||||
| // SplitMix64 finaliser — public-domain constants, 5 ops, no alloc. | ||||||
| let mutable z = h32 * 0x9E3779B97F4A7C15UL | ||||||
| z <- (z ^^^ (z >>> 30)) * 0xBF58476D1CE4E5B9UL | ||||||
| z <- (z ^^^ (z >>> 27)) * 0x94D049BB133111EBUL | ||||||
| this.AddHash (z ^^^ (z >>> 31)) | ||||||
| // SplitMix64 finaliser — see `src/Core/SplitMix64.fs` for the | ||||||
| // constant rationale (golden-ratio + Vigna's BigCrush-validated | ||||||
| // multipliers). 5 ops, no alloc, hot-path safe. | ||||||
|
||||||
| // multipliers). 5 ops, no alloc, hot-path safe. | |
| // multipliers). Allocation-free and hot-path safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: This PR adds a new top-level
memory/*.mdfile, but the repo has amemory-index-integrityworkflow that fails unlessmemory/MEMORY.mdis updated in the same PR whenevermemory/*.mdis added/modified. Please add a newest-first entry for this memory file tomemory/MEMORY.md(otherwise CI will block the merge).