From 4896d875be87cd6ea3ae43f64838ce478280a0a2 Mon Sep 17 00:00:00 2001
From: ArticOdin <ryanf@me.com>
Date: Sat, 23 May 2026 10:20:42 +1000
Subject: [PATCH] docs(spec): three quick wins from spec-completeness audit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Spec audit (5-layer framework) scored the repo 81/100 — Safe tier for
agent delegation — with L2 (interface) + L5 (cultural) flagged yellow.
The three quick wins close the highest-ROI gaps without forcing a
codebase rewrite.

Win 1 — pyproject.toml (L2 + L5 fix)
  Pin ruff + pyright + pytest config. Ruff ruleset matches what the
  codebase already passes (E + F), with a documented future-strict
  ladder (W, I, B, UP, S, BLE, RUF) and pre-counted violations per
  rule so each can be adopted incrementally. Pyright basic-mode +
  reportMissingImports=warning to tolerate HA libs being absent in
  the sandbox while still catching real type errors.

Win 2 — docs/development.md naming + reference implementations (L5)
  Append three sections:
  - Naming conventions for entity_id, provider_id, statistic_id,
    config-flow step IDs, service IDs, CONF_* constants. Pins the
    live-UAT 2026-05-23 lowercase-statistic_id contract in writing.
  - Reference implementations table — canonical file per pattern
    (external-SDK price source, public-endpoint price source,
    CDR-derived provider, composition wrapper, Energy-Dashboard
    sensor, reauth dispatcher, external-stats push).
  - Versioning policy — SemVer + HACS rules, beta tag handling,
    when version bumps happen in a stack.

Win 3 — agent orientation + codex findings in TODOS.md (L1 + L4)
  - CLAUDE.md gains an "Agent orientation" section pointing at the
    six artefacts an agent must read before non-trivial work
    (.paul/STATE.md, .paul/phases/<phase>/PLAN.md, DECISIONS.md,
    TODOS.md, docs/architecture.md, docs/development.md).
  - TODOS.md grows a "Codex full-repo review findings (2026-05-23)"
    section enumerating the P0/P1/P2 items the codex pass surfaced
    (dashboard ha_token in URL, DWT midnight rollover gap, legacy
    hass.data in options flow, per-entry service capture, monotonic
    sum violation, single-rate ranking gap, DWT from_dict ignores
    today, background task cleanup, MagicMock test drift, pycache
    secret-shaped strings). Each entry has file:line + fix + priority
    so a future agent doesn't silently re-discover them.

E741 cleanup in tests/test_blueprints.py — three list comprehensions
used `l` as the loop variable; renamed to `line` so the codebase
clears bare `ruff check` against the new pyproject ruleset.

1070 passing, ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                | 13 +++++++
 TODOS.md                 | 76 ++++++++++++++++++++++++++++++++++++++++
 docs/development.md      | 43 +++++++++++++++++++++++
 pyproject.toml           | 74 ++++++++++++++++++++++++++++++++++++++
 tests/test_blueprints.py | 18 +++++-----
 5 files changed, 215 insertions(+), 9 deletions(-)
 create mode 100644 pyproject.toml
diff --git a/CLAUDE.md b/CLAUDE.md
index 270a6e3..eb6abe0 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,6 +4,19 @@
 
 Compare real energy costs between [Amber Electric](https://www.amber.com.au) (wholesale spot pricing) and [GloBird Energy](https://www.globirdenergy.com.au) (time-of-use tariffs) using actual Home Assistant consumption data.
 
+## Agent orientation
+
+Before starting any non-trivial work, read these in order:
+
+1. **Current phase + state:** `.paul/STATE.md` — which phase is in flight (PLAN / EXECUTE / IDLE), what just shipped.
+2. **Phase acceptance criteria:** `.paul/phases/<phase>/PLAN.md` — the contract for the in-flight phase. Don't invent scope.
+3. **Architectural decisions:** `DECISIONS.md` — every `D-P*-N` entry. The reason a thing IS the way it is.
+4. **Deferred work:** `TODOS.md` — what we intentionally haven't done yet, with rationale. Don't "fix" anything here without confirming.
+5. **Module map + patterns:** `docs/architecture.md`.
+6. **Naming + reference implementations:** `docs/development.md` (Naming + Reference implementations sections).
+
+Tooling discipline (ruff + pyright + pytest) is pinned in `pyproject.toml`. Don't loosen rules — adjust the code.
+
 ## Project Context
 
 - **Target:** Home Assistant custom integration distributed via HACS
diff --git a/TODOS.md b/TODOS.md
index a467c71..2f82e11 100644
--- a/TODOS.md
+++ b/TODOS.md
@@ -150,3 +150,79 @@ See `~/.gstack/projects/Artic0din-ha-pricehawk/ceo-plans/2026-05-14-cdr-tariff-r
 **Effort:** human ~unknown (depends on HA API support); 1 day if hook exists, 1-2 months calendar if requires HA core PR.
 **Priority:** P3.
 **Depends on:** Research outcome. If hook exists: nothing blocking after v1.5.0. If not: HA core contribution.
+
+---
+
+## Codex full-repo review findings (2026-05-23) — open
+
+These items came out of the `codex exec` full-repo review on 2026-05-23 (log at `/tmp/codex-fullrepo-review.log`). They are NOT yet fixed. Listed here so a future agent doesn't silently re-discover them as new bugs. Each links a file:line + the codex rationale.
+
+### TODO-CODEX-P0-1: Legacy iframe puts `ha_token` in URL query
+
+**Where:** `custom_components/pricehawk/dashboard_config.py:115`
+**Why:** Tokens leak via browser history, referrers, screenshots, panel config. Redacting the log line doesn't fix exposure.
+**Fix:** Remove the iframe token path. Use only `panel_custom`/HA-session auth or a postMessage/session bootstrap that never serialises credentials into URLs.
+**Priority:** P0 (security).
+
+### TODO-CODEX-P0-2: Daily rollover doesn't reset DWT provider
+
+**Where:** `custom_components/pricehawk/coordinator.py:1021` + `providers/dynamic_wholesale_tariff.py:73`
+**Why:** "Today" sensors and external statistics corrupt across midnight on DWT entries — yesterday's accumulators bleed into today's count.
+**Fix:** Call `reset_daily()` on all providers during coordinator rollover, OR add provider-level date reset logic. Add a DWT midnight regression test.
+**Priority:** P0 (correctness).
+
+### TODO-CODEX-P1-1: Options-flow named-comparator reads legacy `hass.data`
+
+**Where:** `custom_components/pricehawk/config_flow.py:2734`
+**Why:** v3 stores coordinator on `entry.runtime_data`; the legacy `self.hass.data[DOMAIN][entry_id]` lookup returns None and named comparator setup aborts despite valid data.
+**Fix:** Read `self.config_entry.runtime_data.coordinator`. Add a real OptionsFlow test with populated runtime data.
+**Priority:** P1.
+
+### TODO-CODEX-P1-2: Services captured per-entry instead of resolved at call time
+
+**Where:** `custom_components/pricehawk/__init__.py:101`
+**Why:** Closures capture one `entry`; multi-entry installs route every service call to the last-registered entry.
+**Fix:** Register singleton services once; require/accept `entry_id` in the service call; resolve the target entry from active config entries at call time.
+**Priority:** P1.
+
+### TODO-CODEX-P1-3: External-stats `sum` adds net daily cost (can go negative)
+
+**Where:** `custom_components/pricehawk/statistics.py:65` + `coordinator.py:1053`
+**Why:** HA's `has_sum=True` statistics contract is monotonic. Export-heavy days make the cumulative sum decrease, violating the contract. The CHANGELOG claim "HA tolerates this for cost-style stats" contradicts codex's reading — verify against HA docs before fixing.
+**Fix:** Split import-cost and export-credit streams, OR keep net credit out of the monotonic `sum` and expose net via a separate stat. Negative-day regression test required.
+**Priority:** P1.
+
+### TODO-CODEX-P1-4: Cheap-rank only reads `timeOfUseRates`
+
+**Where:** `custom_components/pricehawk/cdr/ranking.py:153`
+**Why:** Flat / `singleRate` CDR plans are silently excluded from ranking. Alternatives list biases against simpler plans.
+**Fix:** Parse `singleRate.rates[].unitPrice` in `_extract_peak_rate_cents()`. Test against a genuine single-rate CDR plan.
+**Priority:** P1.
+
+### TODO-CODEX-P1-5: DWT `from_dict(today)` ignores the `today` arg
+
+**Where:** `custom_components/pricehawk/providers/dynamic_wholesale_tariff.py:222`
+**Why:** Validates version but restores daily counters unconditionally — yesterday's persisted DWT state becomes today's state after restart.
+**Fix:** Persist a state date alongside the counters; restore daily accumulators only when stored date equals the supplied HA-local date.
+**Priority:** P1.
+
+### TODO-CODEX-P1-6: Setup background tasks not retained / cancelled
+
+**Where:** `custom_components/pricehawk/__init__.py:57`
+**Why:** PR #107's `async_create_background_task` calls don't store the handles. Reload/unload races leave tasks mutating unloaded coordinator state. Pytest already emits unawaited-coroutine warnings on these paths.
+**Fix:** Store the task handles and register `entry.async_on_unload(task.cancel)` or a coordinator shutdown hook.
+**Priority:** P1.
+
+### TODO-CODEX-P2-1: MagicMock conftest + source-string Silver tests masking drift
+
+**Where:** `tests/conftest.py:16` + `tests/test_silver_checklist.py:142`
+**Why:** Broad `MagicMock` HA stubs let stale-`hass.data` regressions pass the Silver "no legacy hass.data" test (TODO-CODEX-P1-1 above).
+**Fix:** Replace key checklist greps with behaviour tests or AST checks, especially for config/options flow and runtime-data access paths.
+**Priority:** P2.
+
+### TODO-CODEX-P2-2: `__pycache__` artefacts contain dummy-secret-shaped strings
+
+**Where:** `custom_components/pricehawk/**/__pycache__/*` + `tests/**/__pycache__/*`
+**Why:** Local audit/secret scans can flag compiled dummy strings used in fixtures, even though git doesn't track them.
+**Fix:** Clean `__pycache__` before release/audit scans. Keep CI secret scans scoped to tracked files OR add an explicit clean step.
+**Priority:** P2.
diff --git a/docs/development.md b/docs/development.md
index 6bdc905..c660537 100644
--- a/docs/development.md
+++ b/docs/development.md
@@ -111,3 +111,46 @@ PriceHawk does **not** use:
 - Auto-merge on PRs
 
 CI runs ruff + pyright + pytest + coverage + gitleaks; that's the full picture.
+
+## Naming conventions
+
+Source of truth for every name the integration emits. Drift here breaks the Energy Dashboard cost picker, HA recorder validation, and config-flow translations.
+
+- **entity_id**: `sensor.pricehawk_<snake_case>`. No camelCase, no abbreviations (`amber_cost_today`, not `AmberCostT`). The `pricehawk_` prefix is mandatory — `sensor.py` matches AEGIS rule "Dashboard entity IDs MUST use the `pricehawk_` prefix matching `sensor.py`".
+- **provider_id**: snake_case slug declared as a literal `PROVIDER_*` constant in `const.py` (`PROVIDER_AMBER = "amber"`, `PROVIDER_DWT_OE = "dwt_openelectricity"`). Provider classes set `self.id = PROVIDER_*` — never compute it from `__class__.__name__`.
+- **statistic_id**: `pricehawk:cost_<entry_id[:8].lower()>_<provider_id>`. Must match `[a-z0-9_]+` per HA recorder contract. Lowercase the entry-id slice — HA's ULIDs are uppercase and recorder rejects raw slices (live UAT 2026-05-23).
+- **config-flow step IDs**: snake_case verbs (`dwt_credentials`, `reauth_amber`, `reconfigure_dwt_oe`). Mirror to `strings.json` `config.step.<step_id>` entries; the byte-identical translations check in CI catches drift.
+- **service IDs**: declared in `services.yaml` + `strings.json` `services.<name>`. snake_case verbs (`analyze_csv`, `backfill_history`, `rank_alternatives`).
+- **CONF_\* constants**: `CONF_<PROVIDER>_<FIELD>` (`CONF_DWT_OE_API_KEY`, `CONF_AMBER_NETWORK_DAILY_CHARGE`). One per config key, no shared keys across providers.
+
+## Reference implementations
+
+When extending the integration, mirror these as the canonical implementations:
+
+| Pattern | Reference file | Notes |
+|---------|----------------|-------|
+| External-SDK price source | `custom_components/pricehawk/providers/openelectricity.py` | API-key handling, 30s timeout, `ConfigEntryAuthFailed` mapping, attribution string, scrubbed `__repr__` |
+| Public-endpoint price source | `custom_components/pricehawk/providers/nemweb.py` | No-API-key path; shared `WholesalePrice` contract from openelectricity.py |
+| CDR-derived provider | `custom_components/pricehawk/providers/cdr_plan.py` | `from_dict` version check, `id = f"{brand}_{plan_id}"` |
+| Composition / wrapper | `custom_components/pricehawk/providers/dynamic_wholesale_tariff.py` | Wraps a `WholesalePriceSource` (OE or NEMWeb) behind one Provider |
+| Energy-Dashboard-pickable sensor | `sensor.py::PriceHawkTodayCostSensor` | `device_class=MONETARY`, `state_class=TOTAL`, `last_reset` at midnight, provider-INDEPENDENT `unique_id` |
+| Reauth dispatcher | `config_flow.py::async_step_reauth` | Reads `coordinator._reauth_provider_id` then falls back to `entry.data[CONF_CURRENT_PROVIDER]` for startup-auth-fail case |
+| External-statistics push | `statistics.py::async_push_daily_cost_to_statistics` | `statistic_id` contract, monotonic-`sum` discipline |
+
+When adding a new retailer:
+1. Add `PROVIDER_<NAME>` to `const.py`.
+2. Implement against the Provider Protocol — pick the closest reference from the table above.
+3. Wire into `coordinator._build_<name>_provider` (or `_current_plan_provider` if it's a primary).
+4. Add config-flow step + `strings.json` entries.
+5. Add a test mirror in `tests/` — copy the structure of the closest existing `test_<provider>.py`.
+
+## Versioning
+
+PriceHawk follows SemVer via `manifest.json["version"]`. HACS reads this field; users see it in the integration page.
+
+- **Patch (`1.6.0` → `1.6.1`)**: bug fixes, no user-visible behaviour change.
+- **Minor (`1.6.0` → `1.7.0`)**: new sensors, new comparator, new options. Backwards-compatible.
+- **Major (`1.x` → `2.0`)**: breaking changes — entity_id rename, removed config keys, anything that requires user action on upgrade.
+- **Beta tag** (`1.6.0-beta.1` → `1.6.0-beta.2`): pre-release for tester reports. Beta tags are NEVER promoted by drop-prefix; create a fresh `1.6.0` tag instead so HACS upgrades cleanly.
+
+Version bumps happen in the PR that completes a phase — not per intermediate PR in a stack. The CI version-drift workflow (`version-drift-guard.yml`) enforces this.
diff --git a/pyproject.toml b/pyproject.toml
new file mode 100644
index 0000000..ea4dd3e
--- /dev/null
+++ b/pyproject.toml
@@ -0,0 +1,74 @@
+# PriceHawk project tooling configuration.
+#
+# Not a build manifest — PriceHawk is distributed via HACS, not PyPI,
+# so there's no [project] / [build-system] block. This file exists to
+# pin ruff + pyright behaviour so agents (and humans) don't drift away
+# from project intent. See `docs/development.md` for naming + reference-
+# implementation conventions that complement these tooling rules.
+
+[tool.ruff]
+target-version = "py313"
+line-length = 100
+extend-exclude = [
+    "custom_components/pricehawk/__pycache__",
+    "tests/__pycache__",
+]
+
+[tool.ruff.lint]
+# Pinned to the ruleset the existing codebase already passes cleanly.
+# Selecting tighter rules without first paying down the resulting
+# violations would just create noise. The commented "future-strict"
+# block below is the target — adopt one rule at a time, paying down
+# each in its own PR before turning it on for everyone.
+select = [
+    "E",    # pycodestyle errors
+    "F",    # pyflakes (undefined names, unused imports)
+]
+ignore = [
+    "E501",  # line length — already enforced by `line-length = 100`
+]
+
+# Future-strict target — enable one at a time, fix the consequent
+# violations, then move the rule from this comment up into `select`:
+#   "W"    pycodestyle warnings — clean today, ship next
+#   "I"    isort (import ordering) — ~56 violations today, mostly auto-fixable
+#   "B"    bugbear (likely bugs, mutable defaults) — ~80 violations today
+#   "UP"   pyupgrade (modern Python idioms) — ~60 violations today
+#   "S"    bandit security — ~30 violations today (mostly test-fixture passwords)
+#   "BLE"  broad except — ~40 violations today (most are intentional boundary catches)
+#   "RUF"  ruff-specific (mutable class defaults, etc.) — ~30 violations today
+# Last counted 2026-05-23 via `ruff check --select <rule>`.
+
+[tool.ruff.lint.per-file-ignores]
+"tests/**/*.py" = [
+    "S105",  # hardcoded "test password" strings are intentional in fixtures
+    "S106",
+    "S107",
+]
+
+[tool.pyright]
+# Strict mode would flag every HA-import as "could not be resolved" in
+# this sandbox (HA libs only installed inside the HA runtime). Use
+# `basic` and suppress the import diagnostics that are noise in our
+# tooling environment — actual type errors still surface.
+typeCheckingMode = "basic"
+pythonVersion = "3.13"
+include = ["custom_components", "tests"]
+exclude = [
+    "**/__pycache__",
+    ".venv",
+]
+reportMissingImports = "warning"
+reportMissingTypeStubs = "warning"
+# Strict-typing intent (per global CLAUDE.md "no Any in Python"): the
+# project bans untyped function signatures and implicit-Any returns.
+# Pyright basic mode does not enforce these, but ruff's ANN ruleset
+# can be enabled per-file if drift starts appearing in review.
+
+[tool.pytest.ini_options]
+# Keep the existing 1070-passing layout. Hypothesis is required for
+# tests/test_tariff_engine_hypothesis.py; codex full-repo review
+# 2026-05-23 flagged that bare pytest fails collection without it.
+required_plugins = ["hypothesis"]
+testpaths = ["tests"]
+addopts = "-q --tb=short"
diff --git a/tests/test_blueprints.py b/tests/test_blueprints.py
index d75fd38..8397246 100644
--- a/tests/test_blueprints.py
+++ b/tests/test_blueprints.py
@@ -46,25 +46,25 @@ class _SafeLoader(yaml.SafeLoader):
             "blueprint": {
                 "name": (
                     [
-                        l.split("name:", 1)[1].strip()
-                        for l in raw.splitlines()
-                        if l.strip().startswith("name:")
+                        line.split("name:", 1)[1].strip()
+                        for line in raw.splitlines()
+                        if line.strip().startswith("name:")
                     ]
                     or ["?"]
                 )[0],
                 "domain": (
                     [
-                        l.split("domain:", 1)[1].strip()
-                        for l in raw.splitlines()
-                        if l.strip().startswith("domain:")
+                        line.split("domain:", 1)[1].strip()
+                        for line in raw.splitlines()
+                        if line.strip().startswith("domain:")
                     ]
                     or [None]
                 )[0],
                 "source_url": (
                     [
-                        l.split("source_url:", 1)[1].strip()
-                        for l in raw.splitlines()
-                        if l.strip().startswith("source_url:")
+                        line.split("source_url:", 1)[1].strip()
+                        for line in raw.splitlines()
+                        if line.strip().startswith("source_url:")
                     ]
                     or [None]
                 )[0],