Add connection-state API#40
Conversation
c59d219 to
667ae2c
Compare
Consumers (notably Home Assistant's nobo_hub integration, for the Silver entity-unavailable / log-when-unavailable rules) need to know whether the hub is currently reachable. This adds a small callback-based API mirroring the existing register_callback pattern. - hub.connected: bool property - register_connection_callback / deregister_connection_callback - Internal _set_connected fires callbacks on True->False and False->True transitions only (idempotent), swallows and logs exceptions from consumer callbacks - Transitions wired in connect() success path, reconnect_hub() success path, socket_receive() before reconnect, and close() on shutdown / transport loss. - Five tests: initial state, single transition, idempotency, deregister, callback-exception-is-swallowed. - README section documenting the new API.
667ae2c to
0b1c2dd
Compare
The connection-state API needs a working reconnect loop for its False→True transitions to actually fire. Smoke-testing against a real hub surfaced three independent defects that broke reconnect, all fixed here: - socket_receive: PynoboConnectionError from get_response() was falling through to the catch-all Exception arm, which called stop() instead of reconnect_hub(). Add a dedicated except arm that routes through reconnect. - keep_alive: fire-and-forget handshakes meant silent network drops (WiFi off, hub unplugged) were invisible for minutes. Track _last_recv_at on every successful read; if no frame in 2× interval, force the reconnect path via close() → EOF. - reconnect_hub: only caught OSError, but async_connect_hub wraps that into PynoboConnectionError — so the first failed attempt escaped and killed the receive loop. Rewrite as a unified retry loop with exponential backoff (10→60s cap); let PynoboHandshakeError propagate explicitly (unrecoverable). Additional cleanup found while tracing these: - socket_receive: dedicated PynoboHandshakeError arm for clean "hub rejected handshake, giving up" log instead of "Unhandled exception" traceback. - stop(): cancel-and-await was joining the wrong task (_keep_alive_task after cancelling _socket_receive_task). Tests: 7 new cases covering each reconnect path (ECONNRESET, generic OSError, silent drop, transient failure retry, terminal handshake rejection propagation, terminal handshake rejection in socket_receive, end-to-end callback ordering). README: new "Reconnect behavior" subsection documenting triggers, retry schedule, terminal failures, and log signals.
Refinements informed by integration testing against Home Assistant's nobo_hub component: * Order callbacks on (re)connect: _set_connected(True) now fires inside async_connect_hub, before the data-callback loop. Consumers that gate on `connected` are guaranteed to see the transition before the data arrives, eliminating a race window that previously required HA-side workarounds. Locked in by test_connect_fires_connection_callback_before_data_callback. * Make async_connect_hub the single source of truth for the True transition. The redundant _set_connected(True) calls in connect() and reconnect_hub() are removed — the idempotency guard already made them no-ops, so this is a contract clarification only. * Restore OSError back-compat: PynoboConnectionError(PynoboError, OSError). The typed-package refactor wrapped the underlying OSError from asyncio.open_connection without preserving its base, silently breaking consumers that catch OSError. Mirrors the existing PynoboValidationError(PynoboError, ValueError, TypeError) pattern. * Replace the cryptic "Reconnecting due to 0 bytes read on a total of undefined expected bytes" with "connection to hub closed by peer; reconnecting" for the IncompleteReadError case.
Spec says the hub broadcasts every 2 seconds, but real-world measurements show gaps up to ~4 seconds. The 3.0 default occasionally missed broadcasts entirely; 5.0 catches at least one every time.
|
Adjusted after testing. Also tuned the default Ready for re-review @capelevy |
capelevy
left a comment
There was a problem hiding this comment.
Looks good. Only found one non-issue. Up to you if you want to improve that.
| # NOTE: this liveness check relies on the hub echoing HANDSHAKE at the app layer. | ||
| # If HANDSHAKE is ever replaced with the spec's KEEPALIVE, investigate how the | ||
| # message is acknowledged by the hub. | ||
| if time.monotonic() - self._last_recv_at > 2 * interval: |
There was a problem hiding this comment.
This general solution is prone to race condition, but is saved since asyncio is single threaded. Any reason not to go for 3× interval? Better example for others or if someone introduces multi threading in the future?
Thoughts for myself to remember risk in future:
If reconnect goes through (
_last_recv_at = 0), then this loop runs beforeself._keep_alivegets set (_last_recv_at = 0+x), it waits for another 14 seconds, then sends a HANDSHAKE, waits for another 14 seconds before checking_last_recv_at = 0+x+2*14.
Depending on how bad your luck is (and a bit how fast your computer is). You may now close the connection without a actual timeout.
There was a problem hiding this comment.
I don't see a reason to bring back multithreading instead of async/await. As you state, it is much more prone to race conditions.
I have given the timeout interval quite some thought. On my local network, I always get a reply from HANDSHAKE within milliseconds. Even with a poor remote connection (e.g. mobile link to a cabin in the mountains) , you should expect a reply within about 10 seconds, worst case around 15 seconds.
Per spec, pynobo should send KEEPALIVE to the hub every 14 seconds. However, this message does not include a reply, which I guess is why HANDSHAKE was chosen instead (I believe this is from original implementation). The hub will disconnect if it does not see a message for 30 seconds - doesn't matter what message as far as I understand it.
On my local network, I could have had a timeout of 1 second, but it's unnecessary complicated to have dynamic timeout depending on connectivity. 2 x interval (=28s) is a reasonable compromise. 3 x interval (=42s) seems longer than necessary to me.
Summary
Adds a connection-state API so consumers (notably Home Assistant's
nobo_hub, for the Silverentity-unavailable/log-when-unavailablerules) can observe when the hub is reachable.hub.connected: boolpropertyregister_connection_callback/deregister_connection_callback(mirrors the existingregister_callbackpattern — sync, called from the event loop)connect()(success),reconnect_hub()(success),socket_receive()(before reconnect), andclose()(shutdown / transport loss). Idempotent: callbacks fire only onTrue ↔ Falsetransitions, not on every reconnect attempt. Callback exceptions are logged and swallowed.Test plan
python -m unittest test_pynobo.py— 12/12 tests pass (7 pre-existing + 5 new)unavailableand a single "hub disconnected" line is logged; re-plug and confirm they come back