Skip to content

Add connection-state API#40

Merged
oyvindwe merged 4 commits into
echoromeo:masterfrom
oyvindwe:connection-state-api
May 3, 2026
Merged

Add connection-state API#40
oyvindwe merged 4 commits into
echoromeo:masterfrom
oyvindwe:connection-state-api

Conversation

@oyvindwe
Copy link
Copy Markdown
Collaborator

@oyvindwe oyvindwe commented Apr 19, 2026

Summary

Adds a connection-state API so consumers (notably Home Assistant's nobo_hub, for the Silver entity-unavailable / log-when-unavailable rules) can observe when the hub is reachable.

  • hub.connected: bool property
  • register_connection_callback / deregister_connection_callback (mirrors the existing register_callback pattern — sync, called from the event loop)
  • Transitions wired in connect() (success), reconnect_hub() (success), socket_receive() (before reconnect), and close() (shutdown / transport loss). Idempotent: callbacks fire only on True ↔ False transitions, not on every reconnect attempt. Callback exceptions are logged and swallowed.
  • README section with an example.

Test plan

  • python -m unittest test_pynobo.py — 12/12 tests pass (7 pre-existing + 5 new)
  • Smoke test on a real Nobø Ecohub: install into a HA dev venv, unplug the hub, confirm entities become unavailable and a single "hub disconnected" line is logged; re-plug and confirm they come back

@oyvindwe oyvindwe marked this pull request as draft April 19, 2026 20:21
@oyvindwe oyvindwe mentioned this pull request Apr 19, 2026
@oyvindwe oyvindwe requested review from capelevy and stigvi April 19, 2026 20:55
@oyvindwe oyvindwe force-pushed the connection-state-api branch 2 times, most recently from c59d219 to 667ae2c Compare April 19, 2026 22:33
Consumers (notably Home Assistant's nobo_hub integration, for the
Silver entity-unavailable / log-when-unavailable rules) need to know
whether the hub is currently reachable. This adds a small
callback-based API mirroring the existing register_callback pattern.

- hub.connected: bool property
- register_connection_callback / deregister_connection_callback
- Internal _set_connected fires callbacks on True->False and
  False->True transitions only (idempotent), swallows and logs
  exceptions from consumer callbacks
- Transitions wired in connect() success path, reconnect_hub() success
  path, socket_receive() before reconnect, and close() on shutdown /
  transport loss.
- Five tests: initial state, single transition, idempotency,
  deregister, callback-exception-is-swallowed.
- README section documenting the new API.
@oyvindwe oyvindwe force-pushed the connection-state-api branch from 667ae2c to 0b1c2dd Compare April 22, 2026 20:31
@oyvindwe oyvindwe marked this pull request as ready for review April 22, 2026 20:37
capelevy
capelevy previously approved these changes Apr 23, 2026
Copy link
Copy Markdown
Collaborator

@capelevy capelevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me :)

The connection-state API needs a working reconnect loop for its
False→True transitions to actually fire. Smoke-testing against a real
hub surfaced three independent defects that broke reconnect, all fixed
here:

- socket_receive: PynoboConnectionError from get_response() was falling
  through to the catch-all Exception arm, which called stop() instead
  of reconnect_hub(). Add a dedicated except arm that routes through
  reconnect.
- keep_alive: fire-and-forget handshakes meant silent network drops
  (WiFi off, hub unplugged) were invisible for minutes. Track
  _last_recv_at on every successful read; if no frame in 2× interval,
  force the reconnect path via close() → EOF.
- reconnect_hub: only caught OSError, but async_connect_hub wraps that
  into PynoboConnectionError — so the first failed attempt escaped and
  killed the receive loop. Rewrite as a unified retry loop with
  exponential backoff (10→60s cap); let PynoboHandshakeError propagate
  explicitly (unrecoverable).

Additional cleanup found while tracing these:

- socket_receive: dedicated PynoboHandshakeError arm for clean "hub
  rejected handshake, giving up" log instead of "Unhandled exception"
  traceback.
- stop(): cancel-and-await was joining the wrong task
  (_keep_alive_task after cancelling _socket_receive_task).

Tests: 7 new cases covering each reconnect path (ECONNRESET, generic
OSError, silent drop, transient failure retry, terminal handshake
rejection propagation, terminal handshake rejection in socket_receive,
end-to-end callback ordering).

README: new "Reconnect behavior" subsection documenting triggers,
retry schedule, terminal failures, and log signals.
Refinements informed by integration testing against Home Assistant's
nobo_hub component:

* Order callbacks on (re)connect: _set_connected(True) now fires inside
  async_connect_hub, before the data-callback loop. Consumers that gate
  on `connected` are guaranteed to see the transition before the data
  arrives, eliminating a race window that previously required HA-side
  workarounds. Locked in by
  test_connect_fires_connection_callback_before_data_callback.

* Make async_connect_hub the single source of truth for the True
  transition. The redundant _set_connected(True) calls in connect() and
  reconnect_hub() are removed — the idempotency guard already made them
  no-ops, so this is a contract clarification only.

* Restore OSError back-compat:
  PynoboConnectionError(PynoboError, OSError). The typed-package
  refactor wrapped the underlying OSError from asyncio.open_connection
  without preserving its base, silently breaking consumers that catch
  OSError. Mirrors the existing
  PynoboValidationError(PynoboError, ValueError, TypeError) pattern.

* Replace the cryptic "Reconnecting due to 0 bytes read on a total of
  undefined expected bytes" with "connection to hub closed by peer;
  reconnecting" for the IncompleteReadError case.
Spec says the hub broadcasts every 2 seconds, but real-world
measurements show gaps up to ~4 seconds. The 3.0 default occasionally
missed broadcasts entirely; 5.0 catches at least one every time.
@oyvindwe
Copy link
Copy Markdown
Collaborator Author

oyvindwe commented May 2, 2026

Adjusted after testing. Also tuned the default autodiscover_wait based on observations. Per spec the hub transmits every 2 seconds, but it could take up to 4 seconds to receive a message. This could be caused by stress on the hub, or that the first UDP message got corrupted.

Ready for re-review @capelevy

@oyvindwe oyvindwe requested a review from capelevy May 2, 2026 16:09
Copy link
Copy Markdown
Collaborator

@capelevy capelevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only found one non-issue. Up to you if you want to improve that.

Comment thread pynobo/__init__.py
# NOTE: this liveness check relies on the hub echoing HANDSHAKE at the app layer.
# If HANDSHAKE is ever replaced with the spec's KEEPALIVE, investigate how the
# message is acknowledged by the hub.
if time.monotonic() - self._last_recv_at > 2 * interval:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general solution is prone to race condition, but is saved since asyncio is single threaded. Any reason not to go for 3× interval? Better example for others or if someone introduces multi threading in the future?

Thoughts for myself to remember risk in future:

If reconnect goes through (_last_recv_at = 0), then this loop runs before self._keep_alive gets set (_last_recv_at = 0+x), it waits for another 14 seconds, then sends a HANDSHAKE, waits for another 14 seconds before checking _last_recv_at = 0+x+2*14.
Depending on how bad your luck is (and a bit how fast your computer is). You may now close the connection without a actual timeout.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to bring back multithreading instead of async/await. As you state, it is much more prone to race conditions.

I have given the timeout interval quite some thought. On my local network, I always get a reply from HANDSHAKE within milliseconds. Even with a poor remote connection (e.g. mobile link to a cabin in the mountains) , you should expect a reply within about 10 seconds, worst case around 15 seconds.

Per spec, pynobo should send KEEPALIVE to the hub every 14 seconds. However, this message does not include a reply, which I guess is why HANDSHAKE was chosen instead (I believe this is from original implementation). The hub will disconnect if it does not see a message for 30 seconds - doesn't matter what message as far as I understand it.

On my local network, I could have had a timeout of 1 second, but it's unnecessary complicated to have dynamic timeout depending on connectivity. 2 x interval (=28s) is a reasonable compromise. 3 x interval (=42s) seems longer than necessary to me.

@oyvindwe oyvindwe merged commit 86cd6a8 into echoromeo:master May 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants