Add connection-state API by oyvindwe · Pull Request #40 · echoromeo/pynobo

oyvindwe · 2026-04-19T20:20:36Z

Summary

Adds a connection-state API so consumers (notably Home Assistant's nobo_hub, for the Silver entity-unavailable / log-when-unavailable rules) can observe when the hub is reachable.

hub.connected: bool property
register_connection_callback / deregister_connection_callback (mirrors the existing register_callback pattern — sync, called from the event loop)
Transitions wired in connect() (success), reconnect_hub() (success), socket_receive() (before reconnect), and close() (shutdown / transport loss). Idempotent: callbacks fire only on True ↔ False transitions, not on every reconnect attempt. Callback exceptions are logged and swallowed.
README section with an example.

Test plan

python -m unittest test_pynobo.py — 12/12 tests pass (7 pre-existing + 5 new)
Smoke test on a real Nobø Ecohub: install into a HA dev venv, unplug the hub, confirm entities become unavailable and a single "hub disconnected" line is logged; re-plug and confirm they come back

Consumers (notably Home Assistant's nobo_hub integration, for the Silver entity-unavailable / log-when-unavailable rules) need to know whether the hub is currently reachable. This adds a small callback-based API mirroring the existing register_callback pattern. - hub.connected: bool property - register_connection_callback / deregister_connection_callback - Internal _set_connected fires callbacks on True->False and False->True transitions only (idempotent), swallows and logs exceptions from consumer callbacks - Transitions wired in connect() success path, reconnect_hub() success path, socket_receive() before reconnect, and close() on shutdown / transport loss. - Five tests: initial state, single transition, idempotency, deregister, callback-exception-is-swallowed. - README section documenting the new API.

capelevy

Looks fine to me :)

The connection-state API needs a working reconnect loop for its False→True transitions to actually fire. Smoke-testing against a real hub surfaced three independent defects that broke reconnect, all fixed here: - socket_receive: PynoboConnectionError from get_response() was falling through to the catch-all Exception arm, which called stop() instead of reconnect_hub(). Add a dedicated except arm that routes through reconnect. - keep_alive: fire-and-forget handshakes meant silent network drops (WiFi off, hub unplugged) were invisible for minutes. Track _last_recv_at on every successful read; if no frame in 2× interval, force the reconnect path via close() → EOF. - reconnect_hub: only caught OSError, but async_connect_hub wraps that into PynoboConnectionError — so the first failed attempt escaped and killed the receive loop. Rewrite as a unified retry loop with exponential backoff (10→60s cap); let PynoboHandshakeError propagate explicitly (unrecoverable). Additional cleanup found while tracing these: - socket_receive: dedicated PynoboHandshakeError arm for clean "hub rejected handshake, giving up" log instead of "Unhandled exception" traceback. - stop(): cancel-and-await was joining the wrong task (_keep_alive_task after cancelling _socket_receive_task). Tests: 7 new cases covering each reconnect path (ECONNRESET, generic OSError, silent drop, transient failure retry, terminal handshake rejection propagation, terminal handshake rejection in socket_receive, end-to-end callback ordering). README: new "Reconnect behavior" subsection documenting triggers, retry schedule, terminal failures, and log signals.

Refinements informed by integration testing against Home Assistant's nobo_hub component: * Order callbacks on (re)connect: _set_connected(True) now fires inside async_connect_hub, before the data-callback loop. Consumers that gate on `connected` are guaranteed to see the transition before the data arrives, eliminating a race window that previously required HA-side workarounds. Locked in by test_connect_fires_connection_callback_before_data_callback. * Make async_connect_hub the single source of truth for the True transition. The redundant _set_connected(True) calls in connect() and reconnect_hub() are removed — the idempotency guard already made them no-ops, so this is a contract clarification only. * Restore OSError back-compat: PynoboConnectionError(PynoboError, OSError). The typed-package refactor wrapped the underlying OSError from asyncio.open_connection without preserving its base, silently breaking consumers that catch OSError. Mirrors the existing PynoboValidationError(PynoboError, ValueError, TypeError) pattern. * Replace the cryptic "Reconnecting due to 0 bytes read on a total of undefined expected bytes" with "connection to hub closed by peer; reconnecting" for the IncompleteReadError case.

Spec says the hub broadcasts every 2 seconds, but real-world measurements show gaps up to ~4 seconds. The 3.0 default occasionally missed broadcasts entirely; 5.0 catches at least one every time.

oyvindwe · 2026-05-02T16:01:00Z

Adjusted after testing. Also tuned the default autodiscover_wait based on observations. Per spec the hub transmits every 2 seconds, but it could take up to 4 seconds to receive a message. This could be caused by stress on the hub, or that the first UDP message got corrupted.

Ready for re-review @capelevy

capelevy

Looks good. Only found one non-issue. Up to you if you want to improve that.

capelevy · 2026-05-03T07:58:38Z

+            # NOTE: this liveness check relies on the hub echoing HANDSHAKE at the app layer.
+            # If HANDSHAKE is ever replaced with the spec's KEEPALIVE, investigate how the
+            # message is acknowledged by the hub.
+            if time.monotonic() - self._last_recv_at > 2 * interval:


This general solution is prone to race condition, but is saved since asyncio is single threaded. Any reason not to go for 3× interval? Better example for others or if someone introduces multi threading in the future?

Thoughts for myself to remember risk in future:

If reconnect goes through (_last_recv_at = 0), then this loop runs before self._keep_alive gets set (_last_recv_at = 0+x), it waits for another 14 seconds, then sends a HANDSHAKE, waits for another 14 seconds before checking _last_recv_at = 0+x+2*14.
Depending on how bad your luck is (and a bit how fast your computer is). You may now close the connection without a actual timeout.

I don't see a reason to bring back multithreading instead of async/await. As you state, it is much more prone to race conditions.

I have given the timeout interval quite some thought. On my local network, I always get a reply from HANDSHAKE within milliseconds. Even with a poor remote connection (e.g. mobile link to a cabin in the mountains) , you should expect a reply within about 10 seconds, worst case around 15 seconds.

Per spec, pynobo should send KEEPALIVE to the hub every 14 seconds. However, this message does not include a reply, which I guess is why HANDSHAKE was chosen instead (I believe this is from original implementation). The hub will disconnect if it does not see a message for 30 seconds - doesn't matter what message as far as I understand it.

On my local network, I could have had a timeout of 1 second, but it's unnecessary complicated to have dynamic timeout depending on connectivity. 2 x interval (=28s) is a reasonable compromise. 3 x interval (=42s) seems longer than necessary to me.

oyvindwe marked this pull request as draft April 19, 2026 20:21

oyvindwe mentioned this pull request Apr 19, 2026

Version 1.9.0 #41

Merged

oyvindwe requested review from capelevy and stigvi April 19, 2026 20:55

oyvindwe force-pushed the connection-state-api branch 2 times, most recently from c59d219 to 667ae2c Compare April 19, 2026 22:33

oyvindwe force-pushed the connection-state-api branch from 667ae2c to 0b1c2dd Compare April 22, 2026 20:31

oyvindwe marked this pull request as ready for review April 22, 2026 20:37

capelevy previously approved these changes Apr 23, 2026

View reviewed changes

oyvindwe dismissed capelevy’s stale review via e9cc43d April 23, 2026 22:06

oyvindwe added 2 commits April 29, 2026 22:53

Bump default autodiscover_wait from 3.0 to 5.0

46367ef

Spec says the hub broadcasts every 2 seconds, but real-world measurements show gaps up to ~4 seconds. The 3.0 default occasionally missed broadcasts entirely; 5.0 catches at least one every time.

oyvindwe requested a review from capelevy May 2, 2026 16:09

capelevy approved these changes May 3, 2026

View reviewed changes

oyvindwe merged commit 86cd6a8 into echoromeo:master May 3, 2026
5 checks passed

oyvindwe mentioned this pull request May 14, 2026

Update pynobo to 1.9.0 home-assistant/core#170559

Merged

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add connection-state API#40

Add connection-state API#40
oyvindwe merged 4 commits into
echoromeo:masterfrom
oyvindwe:connection-state-api

oyvindwe commented Apr 19, 2026 •

edited

Loading

Uh oh!

capelevy left a comment

Uh oh!

oyvindwe commented May 2, 2026

Uh oh!

capelevy left a comment

Uh oh!

capelevy May 3, 2026

Uh oh!

oyvindwe May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oyvindwe commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

capelevy left a comment

Choose a reason for hiding this comment

Uh oh!

oyvindwe commented May 2, 2026

Uh oh!

capelevy left a comment

Choose a reason for hiding this comment

Uh oh!

capelevy May 3, 2026

Choose a reason for hiding this comment

Uh oh!

oyvindwe May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oyvindwe commented Apr 19, 2026 •

edited

Loading