Skip to content

fix(nemweb): regex tolerates full server path prefix + bump 1.6.0-beta.3#147

Merged
Artic0din merged 1 commit into
devfrom
fix/nemweb-regex-case-and-path
May 24, 2026
Merged

fix(nemweb): regex tolerates full server path prefix + bump 1.6.0-beta.3#147
Artic0din merged 1 commit into
devfrom
fix/nemweb-regex-case-and-path

Conversation

@Artic0din
Copy link
Copy Markdown
Owner

Critical hotpatch found via beta.2 UAT.

The earlier _LEGACY fix (#107) silenced the wrong cause. Real NEMWeb HTML:

<A HREF="/Reports/CURRENT/DispatchIS_Reports/PUBLIC_DISPATCHIS_202605221100_..._.zip">

Prior regex was case-insensitive but required PUBLIC_DISPATCHIS_ to sit immediately after the opening quote → matched zero files. AEMO-Direct DWT and Flow Power's AEMO poll have been silently broken since the PR #87 / #107 work.

Fix: insert [^"]*? between href=" and the filename group. Verified live: 576 matches vs 0.

Test plan

  • 2 new regression tests in TestPickLatestFile
  • pytest — 1130 passing
  • ruff check — clean
  • Post-merge tag v1.6.0-beta.3 → HACS beta refresh → re-UAT

Critical hotpatch. The earlier _LEGACY fix (#107) silenced one cause of
NEMWeb listing parse failures but not the actual one. Live UAT against
the production NEMWeb directory on 2026-05-24 showed the real shape:

  <A HREF="/Reports/CURRENT/DispatchIS_Reports/PUBLIC_DISPATCHIS_202605221100_..._.zip">

The prior regex was case-insensitive (via re.IGNORECASE) but required
PUBLIC_DISPATCHIS_ to sit immediately after the opening quote — so it
matched zero files even though both the case and the _LEGACY-optional
fixes were correct. AEMO-Direct DWT and Flow Power's AEMO poll were
both silently broken in production since the PR #87 / #107 work.

Fix: insert a non-greedy ``[^"]*?`` between ``href="`` and the filename
capture group, keeping the captured filename clean so the existing
lexical-sort logic in _pick_latest_dispatch_file works unchanged.

Verified against the live directory: regex now finds 576 matches in the
2026-05-24 snapshot (was 0).

Two regression tests pin the new path-prefix tolerance — one using the
exact shape pulled from the live directory, one defensive with mixed
case + absolute URLs.

Manifest bumped to 1.6.0-beta.3 for HACS-beta re-deploy.
Silver-checklist test version assertion updated to match.

Full test suite: 1130 passing.
@Artic0din Artic0din merged commit a5d3f5b into dev May 24, 2026
1 of 2 checks passed
@Artic0din Artic0din deleted the fix/nemweb-regex-case-and-path branch May 24, 2026 00:55
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a production issue where the NEMWeb directory listing failed to match files due to unexpected server path prefixes in the HREF attributes. The regex in aemo_api.py was updated to handle these prefixes, and the version was bumped to 1.6.0-beta.3 with accompanying regression tests. A review comment identifies a latent bug where the case-insensitive regex matching combined with case-sensitive lexical sorting could lead to incorrect file selection if filename casing varies on the server.

# prefix) keeps working unchanged.
_FILE_RE = re.compile(
r'href="(PUBLIC_DISPATCHIS_\d{12}_\d+(?:_LEGACY)?\.zip)"',
r'href="[^"]*?(PUBLIC_DISPATCHIS_\d{12}_\d+(?:_LEGACY)?\.zip)"',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex uses re.IGNORECASE, which allows matching filenames with inconsistent casing (e.g., public_dispatchis_...). However, the sorting logic in _pick_latest_dispatch_file (line 134) performs a default lexical sort, which is case-sensitive. This introduces a latent bug: if the directory listing ever contains a mix of cases, an older file might be selected as the 'latest' simply because its filename is lowercase (since 'p' > 'P' in ASCII). While NEMWeb filenames are currently uppercase, the code should ideally use a case-insensitive sort (e.g., sorted(matches, key=str.lower)) to remain robust against server-side casing changes or proxy normalization.

Artic0din added a commit that referenced this pull request May 24, 2026
…#153)

* fix(stack): four gemini-review findings I missed across beta.4-beta.8

Ryan caught that I'd been merging today's PRs with --auto without reading
gemini-code-assist reviews. These are the legitimate findings from those
reviews, fixed in one batch.

#148 (gemini): per-tick build_explanation duplicates the rollover branch.
  Beta.4 added a per-tick rebuild but left the daily-rollover rebuild in
  place. At midnight the rollover branch built the "yesterday's summary"
  snapshot, then the per-tick rebuild a few seconds later clobbered it
  with the post-reset all-zeros snapshot. Per-tick rebuild fires on the
  midnight tick too, so the rollover one was redundant AND actively
  harmful. Removed.

#147 (gemini): _pick_latest_dispatch_file lex-sort is case-sensitive.
  _FILE_RE uses re.IGNORECASE — could capture mixed-case filenames.
  sorted(matches)[-1] then compared Unicode codepoints; uppercase sorts
  before lowercase. A hypothetical mixed-case listing would put an older
  uppercase file last after sort. NEMWeb today is all uppercase but the
  contract should match the regex's tolerance. key=str.upper normalises.

#150 (gemini): rebuild_engine never re-reads _grid_power_entity.
  The options→data fallback added in beta.6 fires once at __init__ and
  never refreshes. Users editing the grid sensor via options-flow saw
  the integration silently keep pointing at the prior entity until HA
  restart. rebuild_engine is the options-update path — now re-resolves.

#152 (gemini): reset_today service silently succeeds with no entries.
  Silver action-exceptions rule: service handlers should raise when they
  cannot perform the action. A user with all entries in failed-load
  state would see "Successfully executed pricehawk.reset_today" and
  observe no dashboard change. Now raises HomeAssistantError telling
  them to reload the integration.

PR #146 (export-credit bullet) and PR #149 (loop optimization) are real
findings too but lower priority — left as future work.

Manifest bumped to 1.6.0-beta.9. Silver-checklist version assertion
updated to match.

Full test suite: 1140 passing.

* fix(coordinator): drop duplicate _grid_power_entity overwrite in rebuild_engine

Per gemini review of PR #153: the fallback I added at the TOP of
rebuild_engine was being silently negated by a duplicate options-only
assignment at the END of the same function. Removed the second
assignment; kept a comment marker so this isn't accidentally reintroduced.

Bug effect: rebuild_engine fired on every options-flow save would
overwrite my fix with the empty-string default if entry.options didn't
have CONF_GRID_POWER_SENSOR. Same regression as the original bug
beta.6 was meant to fix.

NOTE: gemini also flagged that rebuild_engine creates new provider
instances on every options update, losing today's daily accumulators
(import_kwh_today, import_cost_today_c, etc.). That's a real concern
but a bigger refactor — left for a follow-up PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant