From 4896d875be87cd6ea3ae43f64838ce478280a0a2 Mon Sep 17 00:00:00 2001 From: ArticOdin Date: Sat, 23 May 2026 10:20:42 +1000 Subject: [PATCH] docs(spec): three quick wins from spec-completeness audit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Spec audit (5-layer framework) scored the repo 81/100 — Safe tier for agent delegation — with L2 (interface) + L5 (cultural) flagged yellow. The three quick wins close the highest-ROI gaps without forcing a codebase rewrite. Win 1 — pyproject.toml (L2 + L5 fix) Pin ruff + pyright + pytest config. Ruff ruleset matches what the codebase already passes (E + F), with a documented future-strict ladder (W, I, B, UP, S, BLE, RUF) and pre-counted violations per rule so each can be adopted incrementally. Pyright basic-mode + reportMissingImports=warning to tolerate HA libs being absent in the sandbox while still catching real type errors. Win 2 — docs/development.md naming + reference implementations (L5) Append three sections: - Naming conventions for entity_id, provider_id, statistic_id, config-flow step IDs, service IDs, CONF_* constants. Pins the live-UAT 2026-05-23 lowercase-statistic_id contract in writing. - Reference implementations table — canonical file per pattern (external-SDK price source, public-endpoint price source, CDR-derived provider, composition wrapper, Energy-Dashboard sensor, reauth dispatcher, external-stats push). - Versioning policy — SemVer + HACS rules, beta tag handling, when version bumps happen in a stack. Win 3 — agent orientation + codex findings in TODOS.md (L1 + L4) - CLAUDE.md gains an "Agent orientation" section pointing at the six artefacts an agent must read before non-trivial work (.paul/STATE.md, .paul/phases//PLAN.md, DECISIONS.md, TODOS.md, docs/architecture.md, docs/development.md). - TODOS.md grows a "Codex full-repo review findings (2026-05-23)" section enumerating the P0/P1/P2 items the codex pass surfaced (dashboard ha_token in URL, DWT midnight rollover gap, legacy hass.data in options flow, per-entry service capture, monotonic sum violation, single-rate ranking gap, DWT from_dict ignores today, background task cleanup, MagicMock test drift, pycache secret-shaped strings). Each entry has file:line + fix + priority so a future agent doesn't silently re-discover them. E741 cleanup in tests/test_blueprints.py — three list comprehensions used `l` as the loop variable; renamed to `line` so the codebase clears bare `ruff check` against the new pyproject ruleset. 1070 passing, ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- CLAUDE.md | 13 +++++++ TODOS.md | 76 ++++++++++++++++++++++++++++++++++++++++ docs/development.md | 43 +++++++++++++++++++++++ pyproject.toml | 74 ++++++++++++++++++++++++++++++++++++++ tests/test_blueprints.py | 18 +++++----- 5 files changed, 215 insertions(+), 9 deletions(-) create mode 100644 pyproject.toml diff --git a/CLAUDE.md b/CLAUDE.md index 270a6e3..eb6abe0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,6 +4,19 @@ Compare real energy costs between [Amber Electric](https://www.amber.com.au) (wholesale spot pricing) and [GloBird Energy](https://www.globirdenergy.com.au) (time-of-use tariffs) using actual Home Assistant consumption data. +## Agent orientation + +Before starting any non-trivial work, read these in order: + +1. **Current phase + state:** `.paul/STATE.md` — which phase is in flight (PLAN / EXECUTE / IDLE), what just shipped. +2. **Phase acceptance criteria:** `.paul/phases//PLAN.md` — the contract for the in-flight phase. Don't invent scope. +3. **Architectural decisions:** `DECISIONS.md` — every `D-P*-N` entry. The reason a thing IS the way it is. +4. **Deferred work:** `TODOS.md` — what we intentionally haven't done yet, with rationale. Don't "fix" anything here without confirming. +5. **Module map + patterns:** `docs/architecture.md`. +6. **Naming + reference implementations:** `docs/development.md` (Naming + Reference implementations sections). + +Tooling discipline (ruff + pyright + pytest) is pinned in `pyproject.toml`. Don't loosen rules — adjust the code. + ## Project Context - **Target:** Home Assistant custom integration distributed via HACS diff --git a/TODOS.md b/TODOS.md index a467c71..2f82e11 100644 --- a/TODOS.md +++ b/TODOS.md @@ -150,3 +150,79 @@ See `~/.gstack/projects/Artic0din-ha-pricehawk/ceo-plans/2026-05-14-cdr-tariff-r **Effort:** human ~unknown (depends on HA API support); 1 day if hook exists, 1-2 months calendar if requires HA core PR. **Priority:** P3. **Depends on:** Research outcome. If hook exists: nothing blocking after v1.5.0. If not: HA core contribution. + +--- + +## Codex full-repo review findings (2026-05-23) — open + +These items came out of the `codex exec` full-repo review on 2026-05-23 (log at `/tmp/codex-fullrepo-review.log`). They are NOT yet fixed. Listed here so a future agent doesn't silently re-discover them as new bugs. Each links a file:line + the codex rationale. + +### TODO-CODEX-P0-1: Legacy iframe puts `ha_token` in URL query + +**Where:** `custom_components/pricehawk/dashboard_config.py:115` +**Why:** Tokens leak via browser history, referrers, screenshots, panel config. Redacting the log line doesn't fix exposure. +**Fix:** Remove the iframe token path. Use only `panel_custom`/HA-session auth or a postMessage/session bootstrap that never serialises credentials into URLs. +**Priority:** P0 (security). + +### TODO-CODEX-P0-2: Daily rollover doesn't reset DWT provider + +**Where:** `custom_components/pricehawk/coordinator.py:1021` + `providers/dynamic_wholesale_tariff.py:73` +**Why:** "Today" sensors and external statistics corrupt across midnight on DWT entries — yesterday's accumulators bleed into today's count. +**Fix:** Call `reset_daily()` on all providers during coordinator rollover, OR add provider-level date reset logic. Add a DWT midnight regression test. +**Priority:** P0 (correctness). + +### TODO-CODEX-P1-1: Options-flow named-comparator reads legacy `hass.data` + +**Where:** `custom_components/pricehawk/config_flow.py:2734` +**Why:** v3 stores coordinator on `entry.runtime_data`; the legacy `self.hass.data[DOMAIN][entry_id]` lookup returns None and named comparator setup aborts despite valid data. +**Fix:** Read `self.config_entry.runtime_data.coordinator`. Add a real OptionsFlow test with populated runtime data. +**Priority:** P1. + +### TODO-CODEX-P1-2: Services captured per-entry instead of resolved at call time + +**Where:** `custom_components/pricehawk/__init__.py:101` +**Why:** Closures capture one `entry`; multi-entry installs route every service call to the last-registered entry. +**Fix:** Register singleton services once; require/accept `entry_id` in the service call; resolve the target entry from active config entries at call time. +**Priority:** P1. + +### TODO-CODEX-P1-3: External-stats `sum` adds net daily cost (can go negative) + +**Where:** `custom_components/pricehawk/statistics.py:65` + `coordinator.py:1053` +**Why:** HA's `has_sum=True` statistics contract is monotonic. Export-heavy days make the cumulative sum decrease, violating the contract. The CHANGELOG claim "HA tolerates this for cost-style stats" contradicts codex's reading — verify against HA docs before fixing. +**Fix:** Split import-cost and export-credit streams, OR keep net credit out of the monotonic `sum` and expose net via a separate stat. Negative-day regression test required. +**Priority:** P1. + +### TODO-CODEX-P1-4: Cheap-rank only reads `timeOfUseRates` + +**Where:** `custom_components/pricehawk/cdr/ranking.py:153` +**Why:** Flat / `singleRate` CDR plans are silently excluded from ranking. Alternatives list biases against simpler plans. +**Fix:** Parse `singleRate.rates[].unitPrice` in `_extract_peak_rate_cents()`. Test against a genuine single-rate CDR plan. +**Priority:** P1. + +### TODO-CODEX-P1-5: DWT `from_dict(today)` ignores the `today` arg + +**Where:** `custom_components/pricehawk/providers/dynamic_wholesale_tariff.py:222` +**Why:** Validates version but restores daily counters unconditionally — yesterday's persisted DWT state becomes today's state after restart. +**Fix:** Persist a state date alongside the counters; restore daily accumulators only when stored date equals the supplied HA-local date. +**Priority:** P1. + +### TODO-CODEX-P1-6: Setup background tasks not retained / cancelled + +**Where:** `custom_components/pricehawk/__init__.py:57` +**Why:** PR #107's `async_create_background_task` calls don't store the handles. Reload/unload races leave tasks mutating unloaded coordinator state. Pytest already emits unawaited-coroutine warnings on these paths. +**Fix:** Store the task handles and register `entry.async_on_unload(task.cancel)` or a coordinator shutdown hook. +**Priority:** P1. + +### TODO-CODEX-P2-1: MagicMock conftest + source-string Silver tests masking drift + +**Where:** `tests/conftest.py:16` + `tests/test_silver_checklist.py:142` +**Why:** Broad `MagicMock` HA stubs let stale-`hass.data` regressions pass the Silver "no legacy hass.data" test (TODO-CODEX-P1-1 above). +**Fix:** Replace key checklist greps with behaviour tests or AST checks, especially for config/options flow and runtime-data access paths. +**Priority:** P2. + +### TODO-CODEX-P2-2: `__pycache__` artefacts contain dummy-secret-shaped strings + +**Where:** `custom_components/pricehawk/**/__pycache__/*` + `tests/**/__pycache__/*` +**Why:** Local audit/secret scans can flag compiled dummy strings used in fixtures, even though git doesn't track them. +**Fix:** Clean `__pycache__` before release/audit scans. Keep CI secret scans scoped to tracked files OR add an explicit clean step. +**Priority:** P2. diff --git a/docs/development.md b/docs/development.md index 6bdc905..c660537 100644 --- a/docs/development.md +++ b/docs/development.md @@ -111,3 +111,46 @@ PriceHawk does **not** use: - Auto-merge on PRs CI runs ruff + pyright + pytest + coverage + gitleaks; that's the full picture. + +## Naming conventions + +Source of truth for every name the integration emits. Drift here breaks the Energy Dashboard cost picker, HA recorder validation, and config-flow translations. + +- **entity_id**: `sensor.pricehawk_`. No camelCase, no abbreviations (`amber_cost_today`, not `AmberCostT`). The `pricehawk_` prefix is mandatory — `sensor.py` matches AEGIS rule "Dashboard entity IDs MUST use the `pricehawk_` prefix matching `sensor.py`". +- **provider_id**: snake_case slug declared as a literal `PROVIDER_*` constant in `const.py` (`PROVIDER_AMBER = "amber"`, `PROVIDER_DWT_OE = "dwt_openelectricity"`). Provider classes set `self.id = PROVIDER_*` — never compute it from `__class__.__name__`. +- **statistic_id**: `pricehawk:cost__`. Must match `[a-z0-9_]+` per HA recorder contract. Lowercase the entry-id slice — HA's ULIDs are uppercase and recorder rejects raw slices (live UAT 2026-05-23). +- **config-flow step IDs**: snake_case verbs (`dwt_credentials`, `reauth_amber`, `reconfigure_dwt_oe`). Mirror to `strings.json` `config.step.` entries; the byte-identical translations check in CI catches drift. +- **service IDs**: declared in `services.yaml` + `strings.json` `services.`. snake_case verbs (`analyze_csv`, `backfill_history`, `rank_alternatives`). +- **CONF_\* constants**: `CONF__` (`CONF_DWT_OE_API_KEY`, `CONF_AMBER_NETWORK_DAILY_CHARGE`). One per config key, no shared keys across providers. + +## Reference implementations + +When extending the integration, mirror these as the canonical implementations: + +| Pattern | Reference file | Notes | +|---------|----------------|-------| +| External-SDK price source | `custom_components/pricehawk/providers/openelectricity.py` | API-key handling, 30s timeout, `ConfigEntryAuthFailed` mapping, attribution string, scrubbed `__repr__` | +| Public-endpoint price source | `custom_components/pricehawk/providers/nemweb.py` | No-API-key path; shared `WholesalePrice` contract from openelectricity.py | +| CDR-derived provider | `custom_components/pricehawk/providers/cdr_plan.py` | `from_dict` version check, `id = f"{brand}_{plan_id}"` | +| Composition / wrapper | `custom_components/pricehawk/providers/dynamic_wholesale_tariff.py` | Wraps a `WholesalePriceSource` (OE or NEMWeb) behind one Provider | +| Energy-Dashboard-pickable sensor | `sensor.py::PriceHawkTodayCostSensor` | `device_class=MONETARY`, `state_class=TOTAL`, `last_reset` at midnight, provider-INDEPENDENT `unique_id` | +| Reauth dispatcher | `config_flow.py::async_step_reauth` | Reads `coordinator._reauth_provider_id` then falls back to `entry.data[CONF_CURRENT_PROVIDER]` for startup-auth-fail case | +| External-statistics push | `statistics.py::async_push_daily_cost_to_statistics` | `statistic_id` contract, monotonic-`sum` discipline | + +When adding a new retailer: +1. Add `PROVIDER_` to `const.py`. +2. Implement against the Provider Protocol — pick the closest reference from the table above. +3. Wire into `coordinator._build__provider` (or `_current_plan_provider` if it's a primary). +4. Add config-flow step + `strings.json` entries. +5. Add a test mirror in `tests/` — copy the structure of the closest existing `test_.py`. + +## Versioning + +PriceHawk follows SemVer via `manifest.json["version"]`. HACS reads this field; users see it in the integration page. + +- **Patch (`1.6.0` → `1.6.1`)**: bug fixes, no user-visible behaviour change. +- **Minor (`1.6.0` → `1.7.0`)**: new sensors, new comparator, new options. Backwards-compatible. +- **Major (`1.x` → `2.0`)**: breaking changes — entity_id rename, removed config keys, anything that requires user action on upgrade. +- **Beta tag** (`1.6.0-beta.1` → `1.6.0-beta.2`): pre-release for tester reports. Beta tags are NEVER promoted by drop-prefix; create a fresh `1.6.0` tag instead so HACS upgrades cleanly. + +Version bumps happen in the PR that completes a phase — not per intermediate PR in a stack. The CI version-drift workflow (`version-drift-guard.yml`) enforces this. diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..ea4dd3e --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,74 @@ +# PriceHawk project tooling configuration. +# +# Not a build manifest — PriceHawk is distributed via HACS, not PyPI, +# so there's no [project] / [build-system] block. This file exists to +# pin ruff + pyright behaviour so agents (and humans) don't drift away +# from project intent. See `docs/development.md` for naming + reference- +# implementation conventions that complement these tooling rules. + +[tool.ruff] +target-version = "py313" +line-length = 100 +extend-exclude = [ + "custom_components/pricehawk/__pycache__", + "tests/__pycache__", +] + +[tool.ruff.lint] +# Pinned to the ruleset the existing codebase already passes cleanly. +# Selecting tighter rules without first paying down the resulting +# violations would just create noise. The commented "future-strict" +# block below is the target — adopt one rule at a time, paying down +# each in its own PR before turning it on for everyone. +select = [ + "E", # pycodestyle errors + "F", # pyflakes (undefined names, unused imports) +] +ignore = [ + "E501", # line length — already enforced by `line-length = 100` +] + +# Future-strict target — enable one at a time, fix the consequent +# violations, then move the rule from this comment up into `select`: +# "W" pycodestyle warnings — clean today, ship next +# "I" isort (import ordering) — ~56 violations today, mostly auto-fixable +# "B" bugbear (likely bugs, mutable defaults) — ~80 violations today +# "UP" pyupgrade (modern Python idioms) — ~60 violations today +# "S" bandit security — ~30 violations today (mostly test-fixture passwords) +# "BLE" broad except — ~40 violations today (most are intentional boundary catches) +# "RUF" ruff-specific (mutable class defaults, etc.) — ~30 violations today +# Last counted 2026-05-23 via `ruff check --select `. + +[tool.ruff.lint.per-file-ignores] +"tests/**/*.py" = [ + "S105", # hardcoded "test password" strings are intentional in fixtures + "S106", + "S107", +] + +[tool.pyright] +# Strict mode would flag every HA-import as "could not be resolved" in +# this sandbox (HA libs only installed inside the HA runtime). Use +# `basic` and suppress the import diagnostics that are noise in our +# tooling environment — actual type errors still surface. +typeCheckingMode = "basic" +pythonVersion = "3.13" +include = ["custom_components", "tests"] +exclude = [ + "**/__pycache__", + ".venv", +] +reportMissingImports = "warning" +reportMissingTypeStubs = "warning" +# Strict-typing intent (per global CLAUDE.md "no Any in Python"): the +# project bans untyped function signatures and implicit-Any returns. +# Pyright basic mode does not enforce these, but ruff's ANN ruleset +# can be enabled per-file if drift starts appearing in review. + +[tool.pytest.ini_options] +# Keep the existing 1070-passing layout. Hypothesis is required for +# tests/test_tariff_engine_hypothesis.py; codex full-repo review +# 2026-05-23 flagged that bare pytest fails collection without it. +required_plugins = ["hypothesis"] +testpaths = ["tests"] +addopts = "-q --tb=short" diff --git a/tests/test_blueprints.py b/tests/test_blueprints.py index d75fd38..8397246 100644 --- a/tests/test_blueprints.py +++ b/tests/test_blueprints.py @@ -46,25 +46,25 @@ class _SafeLoader(yaml.SafeLoader): "blueprint": { "name": ( [ - l.split("name:", 1)[1].strip() - for l in raw.splitlines() - if l.strip().startswith("name:") + line.split("name:", 1)[1].strip() + for line in raw.splitlines() + if line.strip().startswith("name:") ] or ["?"] )[0], "domain": ( [ - l.split("domain:", 1)[1].strip() - for l in raw.splitlines() - if l.strip().startswith("domain:") + line.split("domain:", 1)[1].strip() + for line in raw.splitlines() + if line.strip().startswith("domain:") ] or [None] )[0], "source_url": ( [ - l.split("source_url:", 1)[1].strip() - for l in raw.splitlines() - if l.strip().startswith("source_url:") + line.split("source_url:", 1)[1].strip() + for line in raw.splitlines() + if line.strip().startswith("source_url:") ] or [None] )[0],