chore: retro-review PR #102 — test(tariff): Hypothesis fuzz the five tariff_engine invariants (Phase 11 PR-18) by Artic0din · Pull Request #134 · Artic0din/ha-pricehawk

Artic0din · 2026-05-23T13:11:13Z

Synthetic retro-review of merged PR #102. Diff = original 0bc35b0..7e92212 (same as original PR). Do NOT merge; close after @claude review posts.

…e 11 PR-18) PR-18 — Final v2.0 GA plank. Property-based tests cover the five invariants from v2 research § 7.3 for tariff_engine pure functions (calc_stepped_cost, get_stepped_import_rate, get_current_tou_period). - 1. Monotonic stepped cost: cost(k1) <= cost(k2) for k1 <= k2. - 2. Threshold equality: cost(threshold) == threshold * step1_rate. - 3. Step composition above threshold: cost = step1_cost + (k - threshold) * step2_rate. - 4. Stepped rate dichotomy: returned rate is exactly step1 or step2. - 5. TOU period closure: returned period name is in the supplied dict OR "unknown"; rate matches the period when found; full-day coverage produces no "unknown". ≥ 200 fuzzed examples per invariant via hypothesis. 9 test classes. 1054 total pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Artic0din · 2026-05-23T13:11:15Z

@claude review

claude · 2026-05-23T13:13:57Z

Claude finished @Artic0din's task in 16m 51s —— View job

Code Review — PR #102 Retro: Hypothesis Fuzz Tests for Tariff Engine

Todo

Load tools and post initial comment
Read changed files (tests/test_tariff_engine_hypothesis.py, CHANGELOG.md)
Diff against base branch
Analyze test structure, correctness, and coverage
Post full review findings

Overall the test suite is well-structured and the invariant coverage is sensible. Two production-code issues and one test-design concern need attention before this can be considered a clean fuzz harness.

Bug: Boundary semantics mismatch between `calc_stepped_cost` and `get_stepped_import_rate`

This is the most important finding. The two functions disagree on what happens at k == threshold:

tariff_engine.py:83 — calc_stepped_cost

if total_kwh <= threshold:          # ≤  → AT threshold, charges step1_rate
    return total_kwh * tariff["step1_rate"]

tariff_engine.py:75 — get_stepped_import_rate

if daily_kwh < threshold:           # <  → AT threshold, returns step2_rate
    return tariff["step1_rate"]
return tariff["step2_rate"]

Invariant 2 (TestStepCostAtThreshold) correctly asserts that calc_stepped_cost(tariff, threshold) == threshold * step1. But TestSteppedRateDichotomy.test_at_or_above_threshold_returns_step2 simultaneously asserts that get_stepped_import_rate(tariff, threshold) returns step2. These encode contradictory boundary semantics as both correct. The TariffEngine.update() loop at tariff_engine.py:329 uses get_stepped_import_rate with the accumulated daily kWh, so a household that has imported exactly threshold kWh will be charged at step2_rate for their next unit — consistent with calc_stepped_cost above threshold — but the marginal rate sensor exposed via current_import_rate_c_kwh (line 385) will also show step2_rate. This is probably intentional ("you've crossed the threshold"), but the test file should document this explicitly rather than silently encoding both. Fix this →

Test design: `return` vs `assume()` causes silent under-coverage

tests/test_tariff_engine_hypothesis.py:109-110:

if k <= threshold:
    return  # Different invariant covers ≤ threshold.

Same pattern at lines 143-144 and 153-154.

Using return marks the example as passed rather than as a discard, so Hypothesis does not attempt to generate a replacement. With _kwh in [0, 200] and _threshold in [0.01, 50], roughly 75% of (k, threshold) pairs have k ≤ threshold, meaning TestStepCostAboveThreshold and the two split tests in TestSteppedRateDichotomy each effectively exercise around 50 genuinely constrained examples out of the 200 requested. The correct idiom is:

from hypothesis import assume
assume(k > threshold)

This causes Hypothesis to count only valid draws toward max_examples, so the 200-example budget is spent entirely on the interesting region. Fix this →

Minor: Floating-point tolerance inconsistency

Invariant 2 (test_cost_at_threshold_uses_only_step1, line 92) uses 1e-9. Invariant 3 (test_above_threshold_step_composition, line 114) uses 1e-6. Both are single multiplications of bounded floats (max ~200 × 200 = 40 000) so IEEE 754 round-trip error is at most a few ULPs (~4e-12), making 1e-9 safely tight. The inconsistency is harmless but worth aligning to a single standard (either 1e-9 everywhere or math.isclose(rel_tol=1e-9) for scale-independent comparison).

Positive notes

The deadline=None setting is correct — CI wall-clock variance would otherwise produce flaky failures on slow runners.
_threshold = st.floats(min_value=0.01, ...) with the explanatory comment is good defensive design; it correctly excludes the degenerate threshold=0 case and explains why.
_basic_tou_periods with an overnight midnight-crossing window ("23:00" → "07:00") directly exercises the start > end branch at tariff_engine.py:64-67 — good edge-case selection.
test_full_day_coverage_no_unknown as a fixture self-check is a nice pattern; it catches gaps in _basic_tou_periods before the real property tests run.
The docstring mapping each numbered invariant to the test class is clear and should stay.

Summary

Finding	Severity	File	Lines
Boundary `<` vs `≤` inconsistency silently encoded as correct	Medium	`tariff_engine.py`	75, 83
`return` instead of `assume()` undercuts effective example count	Low	`test_tariff_engine_hypothesis.py`	109, 143, 153
Tolerance inconsistency `1e-9` vs `1e-6`	Cosmetic	`test_tariff_engine_hypothesis.py`	92, 114

The boundary mismatch is the one worth following up on — whether it's intentional or a subtle bug in get_stepped_import_rate, the intent should be documented in a comment or resolved with a consistent operator. Everything else is clean.

claude · 2026-05-23T13:14:40Z

Claude encountered an error after 8m 17s —— View job

I'll analyze this and get back to you.

Artic0din · 2026-05-23T13:40:08Z

Retro-review complete. Findings (if any) tracked in fix/retro-review-batch (#143) or noted in .planning/retro-review-findings.md. Closing synthetic PR + cleaning branches.

Artic0din closed this May 23, 2026

Artic0din deleted the claude-retro/pr-102-head branch May 23, 2026 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: retro-review PR #102 — test(tariff): Hypothesis fuzz the five tariff_engine invariants (Phase 11 PR-18)#134

chore: retro-review PR #102 — test(tariff): Hypothesis fuzz the five tariff_engine invariants (Phase 11 PR-18)#134
Artic0din wants to merge 1 commit into
claude-retro/pr-102-basefrom
claude-retro/pr-102-head

Artic0din commented May 23, 2026

Uh oh!

Artic0din commented May 23, 2026

Uh oh!

claude Bot commented May 23, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 23, 2026 •

edited

Loading

Uh oh!

Artic0din commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Artic0din commented May 23, 2026

Uh oh!

Artic0din commented May 23, 2026

Uh oh!

claude Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — PR #102 Retro: Hypothesis Fuzz Tests for Tariff Engine

Bug: Boundary semantics mismatch between calc_stepped_cost and get_stepped_import_rate

Test design: return vs assume() causes silent under-coverage

Minor: Floating-point tolerance inconsistency

Positive notes

Summary

Uh oh!

claude Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artic0din commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 23, 2026 •

edited

Loading

Bug: Boundary semantics mismatch between `calc_stepped_cost` and `get_stepped_import_rate`

Test design: `return` vs `assume()` causes silent under-coverage

claude Bot commented May 23, 2026 •

edited

Loading