From fcbc13b9b1b81b4bfa979475c0d742e1e5b8989e Mon Sep 17 00:00:00 2001
From: Aaron Stainback <aaron_bond@yahoo.com>
Date: Thu, 23 Apr 2026 13:54:38 -0400
Subject: [PATCH] =?UTF-8?q?backlog:=20P1=20=E2=80=94=20HLL=20property-test?=
 =?UTF-8?q?=20flakiness=20(investigate=20before=20retry=20per=20DST)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Observed 2026-04-23 (auto-loop-88): Zeta.Tests.Properties.
FuzzTests.fuzz "HLL estimate within theoretical error bound"
failed on CI for PR #159 — a PR that only touches
memory/*.md files. Failure inherited from main at rebase
time; not caused by the PR's changes.

Per DST discipline (retries are a non-determinism smell;
investigate before retry), file for investigation:

1. Is the error bound formula correct (1.04/sqrt(m) +
   confidence-interval factor)?
2. Is the test seeded deterministically (FsCheck supports
   explicit seeds)?
3. Is it actually a real regression (bisect recent commits)?
4. What specifically fails at which seed?

Deliverable: research note under
docs/research/hll-property-test-flakiness-YYYY-MM-DD.md
naming cause + fix. Blocking session PRs currently.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 docs/BACKLOG.md | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md
index 6e3daf0f..1cf94a66 100644
--- a/docs/BACKLOG.md
+++ b/docs/BACKLOG.md
@@ -2309,6 +2309,56 @@ within each priority tier.
 
 ## P1 — CI / DX follow-ups (after round-29 anchor)
 
+- [ ] **HLL property-test flakiness — investigate before
+  retry (DST discipline).** Observed 2026-04-23 (auto-loop-88):
+  `Zeta.Tests.Properties.FuzzTests.fuzz: HLL estimate
+  within theoretical error bound` failed in CI on PR #159
+  (gh run 24849954881 / build-and-test ubuntu-22.04 /
+  FsCheck.Xunit.PropertyFailedException). The failing PR
+  only touches `memory/*.md` files — unrelated to the
+  test. Failure is inherited from the main-branch state
+  at rebase time.
+
+  Per the DST discipline
+  (`memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md`
+  — per-user), retries are a non-determinism smell. A
+  flaky property test is genuine non-determinism; the
+  investigation should answer:
+
+  1. **Is the error bound formula correct?** HLL has a
+     known standard-error of `1.04 / sqrt(m)` where `m`
+     is the number of registers. The test bound should
+     reflect that + a factor for confidence interval.
+  2. **Is the test seeded deterministically?** FsCheck
+     supports explicit seeds; a flaky property under
+     random seeds should be seed-pinned + the failing
+     seed captured for regression.
+  3. **Is it actually a real regression?** The test
+     was passing recently (session PRs earlier today ran
+     CI green on this check). Bisect against recent
+     commits to identify when it started failing.
+  4. **What's the cost of re-running?** If the failure
+     is a genuine edge-case at one seed in ten thousand,
+     re-run succeeds. But DST discipline says investigate
+     first: understand WHY this seed fails before
+     accepting "flaky = retry."
+
+  **Deliverable**: research note under
+  `docs/research/hll-property-test-flakiness-YYYY-MM-DD.md`
+  naming the cause + fix (either tighten bound, pin
+  seed, or fix the HLL implementation). No deadline; but
+  the test is currently blocking session PRs from
+  merging until re-run passes.
+
+  **Effort**: S if the bound formula is wrong (tighten +
+  rerun); M if it's a genuine implementation edge case
+  requiring investigation.
+
+  **Composes with**: the DST retry-is-smell memory; the
+  samples-readability-real-code-zero-alloc memory (HLL
+  is library-internal, so low-alloc + correctness are
+  library-scope).
+
 - [ ] **Declarative parity across dev-inner-loop / qa / dev / stage / prod — environment-parity research, time-budgeted (research-first, no implementation tonight).**
   Aaron (2026-04-20): *"also we want our dev innner loop, qa,
   dev, stage, prod to all have declarative pairty someting