Skip to content

backlog: P1 — HLL property-test flakiness (investigate before retry per DST)#175

Merged
AceHack merged 1 commit intomainfrom
backlog/hll-property-test-flakiness-investigation
Apr 23, 2026
Merged

backlog: P1 — HLL property-test flakiness (investigate before retry per DST)#175
AceHack merged 1 commit intomainfrom
backlog/hll-property-test-flakiness-investigation

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 23, 2026

Summary

Files a P1 BACKLOG row for the HLL property-test failure observed on PR #159 at auto-loop-88 tick. Zeta.Tests.Properties.FuzzTests.fuzz: HLL estimate within theoretical error bound is failing on CI despite the PR's changes being memory-only markdown edits — the failure is inherited from main at rebase time, not caused by the PR.

DST discipline says investigate before retry

Per memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md (per-user), retries are a non-determinism smell. A flaky property test IS genuine non-determinism; the investigation should answer:

  1. Is the error bound formula correct? HLL has standard-error 1.04 / sqrt(m); test bound should reflect that + confidence interval.
  2. Is the test seeded deterministically? FsCheck supports explicit seeds; flaky → seed-pin + capture the failing seed.
  3. Is it a real regression? Test was passing on session PRs earlier today. Bisect.
  4. What's the cost of re-running? Understand WHY this seed fails before accepting "flaky = retry."

Currently blocking

PR #159 (Overlay A migration — deletions-over-insertions). Until the HLL failure is understood, a re-run might pass by chance but doesn't close the DST concern.

🤖 Generated with Claude Code

…er DST)

Observed 2026-04-23 (auto-loop-88): Zeta.Tests.Properties.
FuzzTests.fuzz "HLL estimate within theoretical error bound"
failed on CI for PR #159 — a PR that only touches
memory/*.md files. Failure inherited from main at rebase
time; not caused by the PR's changes.

Per DST discipline (retries are a non-determinism smell;
investigate before retry), file for investigation:

1. Is the error bound formula correct (1.04/sqrt(m) +
   confidence-interval factor)?
2. Is the test seeded deterministically (FsCheck supports
   explicit seeds)?
3. Is it actually a real regression (bisect recent commits)?
4. What specifically fails at which seed?

Deliverable: research note under
docs/research/hll-property-test-flakiness-YYYY-MM-DD.md
naming cause + fix. Blocking session PRs currently.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 23, 2026 17:54
@AceHack AceHack enabled auto-merge (squash) April 23, 2026 17:54
AceHack added a commit that referenced this pull request Apr 23, 2026
…-hard-problems memory

PR #159 CI blocked by a real HLL FsCheck property test failure
inherited from main (not caused by the PR's memory-only edits).
Per DST retries-are-smell discipline: filed P1 BACKLOG row
(PR #175) for investigation-before-retry. Four questions
queued: formula correctness, seed determinism, bisect,
understand the failing seed.

Aaron future-framing: "when zeta ships its the backend and
libraries that solve all the hard problems so application/
demo code can be easier and not hhave to worry about so
much to still be performant." Per-user memory filed capturing
the long-term library-carries-cost-so-app-stays-simple goal
state. Composes with the earlier samples-readability-vs-
production-zero-alloc memory.

Both moves advance the queue without volume.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit d7717f0 into main Apr 23, 2026
11 of 12 checks passed
@AceHack AceHack deleted the backlog/hll-property-test-flakiness-investigation branch April 23, 2026 17:56
AceHack added a commit that referenced this pull request Apr 23, 2026
Aaron: "yeah pinned seeds is from DST ... to make them
deterministic."

PR #175 updated: HLL BACKLOG row explicitly says pinned
seeds ARE the DST resolution (not "a thing to try"); retry-
until-green is the non-DST path and explicitly rejected.
Added FsCheck Replay attribute mechanics + pin-then-explore
idiomatic pattern.

Per-user memory filed capturing the DST→property-test
sharpening. Composes with parent DST retries-are-smell memory.

Aaron's confirmation validates the investigation-first
discipline — filing the BACKLOG row instead of retrying was
the right move AND adds a concrete DST mechanic (pinning).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P1 backlog item to track and investigate a flaky HyperLogLog (HLL) FsCheck property test failure observed in CI, with an explicit “investigate before retry” (DST) framing.

Changes:

  • Added a P1 docs/BACKLOG.md row documenting the observed CI failure details (test name, run ID, environment, PR context).
  • Captured a concrete investigation checklist (bound correctness, deterministic seeding, regression/bisect, rerun economics).
  • Defined a deliverable target as a dated research note under docs/research/.

Comment thread docs/BACKLOG.md
Comment on lines +2322 to +2324
Per the DST discipline
(`memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md`
— per-user), retries are a non-determinism smell. A
Comment thread docs/BACKLOG.md
Comment on lines +2315 to +2316
within theoretical error bound` failed in CI on PR #159
(gh run 24849954881 / build-and-test ubuntu-22.04 /
AceHack added a commit that referenced this pull request Apr 23, 2026
…fix; whimsy-list extended

10 session PRs merged (+#160 +#175).

PR #159: Copilot caught a wrapped-path rodney/ reference my
prior sed missed (path spanned two lines). python replace
fixed. Thread resolved. Lesson: grep for terminal-path-
segment, not full path, to catch wrapped.

Aaron seed-whimsy list extension: "feel free to keep a
list of whimiscal numbers to choose from for seeds ... like
with 42 the meaning of life lol." Per-user memory extended
with current list (69 / 420 / 42) + candidate expansions
(9000 DBZ, 1337 leet, 314159 π, 271828 e, 1729 Hardy-
Ramanujan, others).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 23, 2026
…w-number fixes

PR #159 (Overlay A #3 deletions-over-insertions) MERGED at
18:02:47Z. 11 session PRs merged. HLL test passed on re-run
(different seed) — real-world data for the PR #175 BACKLOG
row on HLL flakiness; pin-then-explore is still the right
fix.

Aaron directive: "be PC when you write the 69 and 420
descriptions of whemsy we want this repo to be high school
curruclurm friendly so R rated is okay but only when
necessary for effect." PC-ified seed-whimsy memory
descriptions (69 → internet-meme-symmetrical-digit;
420 → counterculture-meme). Added PC-framing section
naming the high-school-curriculum-friendly standard.

PR #172 row-number misrefs fixed (#48#51 for cross-
platform parity; #44#47 for fire-history schema).
Third finding via lands-via-#150 reply.

Row-number misref is recurring; candidate for row #54
first cadenced fire.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants