-
Notifications
You must be signed in to change notification settings - Fork 0
feat(integration): Phase 3.1 commit 5 — rank_alternatives service + lifecycle #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bfc89a1
f998d16
60153a3
da84f62
d022187
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| """Phase 3.1 — Coordinator-facing daily ranking job orchestration. | ||
|
|
||
| Lives outside ``coordinator.py`` so the pure logic is unit-testable | ||
| without the HA app context that ``PriceHawkCoordinator`` requires | ||
| (``DataUpdateCoordinator[T]`` parameterised bases don't survive the | ||
| mock-based conftest). | ||
|
|
||
| Each function takes the inputs it needs explicitly — config-entry | ||
| options, the registry, a HTTP session. The coordinator-side wrapper | ||
| methods (in ``coordinator.py``) own the side effects (scheduling | ||
| callbacks, persisting results, swallowing exceptions across the daily | ||
| boundary). | ||
| """ | ||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| from typing import Any, TYPE_CHECKING | ||
|
|
||
| from .ranking import DEFAULT_TOP_K, rank_alternatives | ||
| from .registry import RetailerEndpoint, find_by_brand, get_registry | ||
|
|
||
| if TYPE_CHECKING: | ||
| import aiohttp | ||
|
|
||
| _LOGGER = logging.getLogger(__name__) | ||
|
|
||
| # Big-4 nationally-active retailers scanned on every daily run. | ||
| # EME refdata2 doesn't carry per-retailer geography, so we always | ||
| # attempt these four; ``rank_alternatives`` then filters their plans | ||
| # by the user's postcode/distributor anyway. | ||
| DEFAULT_COMPETITOR_BRAND_FRAGMENTS: tuple[str, ...] = ( | ||
| "agl", | ||
| "origin", | ||
| "energyaustralia", | ||
| "red energy", | ||
| ) | ||
|
|
||
|
|
||
| def get_user_geography( | ||
| options: dict[str, Any], | ||
| ) -> tuple[str | None, str | None, str | None]: | ||
| """Pull ``(state, postcode, distributor)`` from a config_entry's options. | ||
|
|
||
| - ``postcode``: ``cdr_postcode`` option (set by the wizard). | ||
| - ``distributor``: first entry in ``cdr_plan.data.geography.distributors``. | ||
| The user already accepted this plan so its distributor IS theirs. | ||
| - ``state``: returned as ``None`` — derived later in the registry | ||
| filter when needed. Postcode + distributor is more precise. | ||
| """ | ||
| postcode = options.get("cdr_postcode") or None | ||
| # CR-fix: every level guarded with isinstance — malformed payloads | ||
| # can ship ``cdr_plan`` as a string, ``data`` as a list, ``geography`` | ||
| # as None, etc. Without guards, ``.get()`` / ``.strip()`` raise | ||
| # AttributeError and abort the whole ranking run. | ||
| plan_data = _safe_plan_data(options) | ||
| geo = plan_data.get("geography") or {} | ||
| if not isinstance(geo, dict): | ||
| return None, postcode, None | ||
| distributors = geo.get("distributors") | ||
| distributor = ( | ||
| distributors[0] | ||
| if isinstance(distributors, list) | ||
| and distributors | ||
| and isinstance(distributors[0], str) | ||
| else None | ||
| ) | ||
| return None, postcode, distributor | ||
|
|
||
|
|
||
| def _safe_plan_data(options: dict[str, Any]) -> dict[str, Any]: | ||
| """Pull ``cdr_plan.data`` safely. Returns ``{}`` on any malformed shape. | ||
|
|
||
| Tolerated malformations: ``cdr_plan`` missing / non-dict, ``data`` | ||
| missing / non-dict. Used by both ``get_user_geography`` and | ||
| ``get_competitor_retailers``. | ||
| """ | ||
| cdr_plan = options.get("cdr_plan") | ||
| if not isinstance(cdr_plan, dict): | ||
| return {} | ||
| plan_data = cdr_plan.get("data") | ||
| return plan_data if isinstance(plan_data, dict) else {} | ||
|
|
||
|
|
||
| async def get_competitor_retailers( | ||
| session: aiohttp.ClientSession, | ||
| options: dict[str, Any], | ||
| *, | ||
| competitor_fragments: tuple[str, ...] = DEFAULT_COMPETITOR_BRAND_FRAGMENTS, | ||
| ) -> list[RetailerEndpoint]: | ||
| """Build the retailer list scanned during the daily ranking job. | ||
|
|
||
| Composition (in priority order, dedup by ``brand_id``): | ||
| 1. User's CURRENT retailer (from ``cdr_plan.data.brand``). | ||
|
Comment on lines
+84
to
+93
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: Handle the case where the current brand exists but does not resolve to a registry entry more explicitly. If Suggested implementation: current_brand = plan_data.get("brand")
current_retailer: RetailerEndpoint | None = None
if current_brand:
# First, try the exact lookup used previously
current_retailer = registry.find_by_brand(current_brand)
if not current_retailer:
# Fallback: trim whitespace and do a case-insensitive comparison against
# known registry brands to recover from minor formatting drift.
normalized_brand = current_brand.strip().lower()
for retailer in registry:
try:
retailer_brand = retailer.brand
except AttributeError:
# Be defensive if registry entries are not uniform
continue
if retailer_brand and retailer_brand.strip().lower() == normalized_brand:
current_retailer = retailer
break
if not current_retailer:
# Explicitly log when the stored plan refers to a brand that we cannot
# resolve in the registry so configuration drift is discoverable.
_LOGGER.info(
"CDR ranking: current brand %r from plan did not match any registry entry",
current_brand,
)I assumed the existence of:
To integrate this cleanly, you should:
|
||
| 2. The hardcoded big-4 competitors. | ||
|
|
||
| Falls back to baked-in registry via ``get_registry``'s own fallback | ||
| when live fetch fails. Returns ``[]`` if registry is empty (edge | ||
| case; baked-in always has 100+ entries). | ||
| """ | ||
| endpoints, source = await get_registry(session) | ||
| _LOGGER.debug( | ||
| "ranking: registry source=%s, %d retailers", source, len(endpoints) | ||
| ) | ||
|
|
||
| out: list[RetailerEndpoint] = [] | ||
| seen_brand_ids: set[str] = set() | ||
|
|
||
| plan_data = _safe_plan_data(options) | ||
| raw_brand = plan_data.get("brand") | ||
| # ``brand`` is sometimes shipped as None or non-string by retailers; | ||
| # only accept str to keep ``.strip()`` and ``find_by_brand`` safe. | ||
| current_brand = raw_brand.strip() if isinstance(raw_brand, str) else "" | ||
| if current_brand: | ||
| current = find_by_brand(endpoints, current_brand) | ||
| if current is not None: | ||
| out.append(current) | ||
| seen_brand_ids.add(current.brand_id) | ||
|
|
||
| for fragment in competitor_fragments: | ||
| match = find_by_brand(endpoints, fragment) | ||
| if match is None or match.brand_id in seen_brand_ids: | ||
| continue | ||
| out.append(match) | ||
| seen_brand_ids.add(match.brand_id) | ||
|
|
||
| return out | ||
|
|
||
|
|
||
| async def run_ranking_job( | ||
| session: aiohttp.ClientSession, | ||
| options: dict[str, Any], | ||
| *, | ||
| top_k: int = DEFAULT_TOP_K, | ||
| plan_cache: dict[str, dict[str, Any]] | None = None, | ||
| competitor_fragments: tuple[str, ...] = DEFAULT_COMPETITOR_BRAND_FRAGMENTS, | ||
| ) -> list[dict[str, Any]]: | ||
| """Run the cheap-rank pipeline. Returns the top-K plans. | ||
|
|
||
| Cheap-rank only for now. Deep-rank (consumption replay) joins in | ||
| Phase 3.2 when the universal HA-history backfill ships and we | ||
| have real per-slot consumption to rank against. | ||
|
|
||
| Caller (coordinator) is responsible for: | ||
| - Scheduling (``async_track_time_change``). | ||
| - Persisting the returned list onto coordinator state. | ||
| - Catching exceptions across the daily boundary (this function | ||
| only catches its own — ``rank_alternatives``'s exception | ||
| isolation per retailer). | ||
|
|
||
| Returns ``[]`` if no retailers resolved (e.g. registry empty). | ||
| """ | ||
| retailers = await get_competitor_retailers( | ||
| session, options, competitor_fragments=competitor_fragments | ||
| ) | ||
| if not retailers: | ||
| _LOGGER.info("ranking: no competitor retailers resolved; skipping") | ||
| return [] | ||
|
|
||
| _state, postcode, distributor = get_user_geography(options) | ||
|
|
||
| return await rank_alternatives( | ||
| session, | ||
| retailers, | ||
| state=_state, | ||
| postcode=postcode, | ||
| distributor=distributor, | ||
| top_k=top_k, | ||
| cache=plan_cache, | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Avoid duplicating the default
top_kvalue; reuse the shared constant to prevent drift.20is hardcoded in the default forcall.data.get, in the warning text, and implicitly as the clamp bound. Prefer a single source of truth: either importDEFAULT_TOP_Kfrom the ranking layer and use it in all three places, or define a module-level constant here and reference it everywhere. That avoids divergence if the default changes elsewhere.Suggested implementation:
If there is already a shared default (e.g.
DEFAULT_TOP_K) in your ranking layer, you may prefer to:RANK_ALTERNATIVES_DEFAULT_TOP_Kwith the imported constant name.This preserves a single source of truth across modules.