-
-
Notifications
You must be signed in to change notification settings - Fork 37.6k
Diagnostics report for Thread networks #88541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
197c761
Initial diagnostics
Jc2k 69ce577
Include MLP of local OTBR
Jc2k d736f69
Add dep on pyroute2
Jc2k 122596c
Move pyroute2 onto executor
Jc2k e9d9e18
More comments
Jc2k ab5f3c6
Read thread data direct from zeroconf cache
Jc2k 092bc3e
Get neighbour cache for known BR's
Jc2k 0c02bbc
isort
Jc2k a052f30
mypy
Jc2k 854e06c
Add diagnostic test
Jc2k 9a3c121
rel import
Jc2k cdf27f1
Fix pylint
Jc2k 7b15281
Restore coverage in discovery.py
Jc2k File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,208 @@ | ||
| """Diagnostics support for Thread networks. | ||
|
|
||
| When triaging Matter and HomeKit issues you often need to check for problems with the Thread network. | ||
|
|
||
| This report helps spot and rule out: | ||
|
|
||
| * Is the users border router visible at all? | ||
| * Is the border router actually announcing any routes? The user could have a network boundary like | ||
| VLANs or WiFi isolation that is blocking the RA packets. | ||
| * Alternatively, if user isn't on HAOS they could have accept_ra_rt_info_max_plen set incorrectly. | ||
| * Are there any bogus routes that could be interfering. If routes don't expire they can build up. | ||
| When you have 10 routes and only 2 border routers something has gone wrong. | ||
|
|
||
| This does not do any connectivity checks. So user could have all their border routers visible, but | ||
| some of their thread accessories can't be pinged, but it's still a thread problem. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import Any, TypedDict | ||
|
|
||
| from pyroute2 import NDB # pylint: disable=no-name-in-module | ||
| from python_otbr_api.tlv_parser import MeshcopTLVType | ||
|
|
||
| from homeassistant.components import zeroconf | ||
| from homeassistant.config_entries import ConfigEntry | ||
| from homeassistant.core import HomeAssistant | ||
|
|
||
| from .dataset_store import async_get_store | ||
| from .discovery import async_read_zeroconf_cache | ||
|
|
||
|
|
||
| class Neighbour(TypedDict): | ||
| """A neighbour cache entry (ip neigh).""" | ||
|
|
||
| lladdr: str | ||
| state: int | ||
| probes: int | ||
|
|
||
|
|
||
| class Route(TypedDict): | ||
| """A route table entry (ip -6 route).""" | ||
|
|
||
| metrics: int | ||
| priority: int | ||
| is_nexthop: bool | ||
|
|
||
|
|
||
| class Router(TypedDict): | ||
| """A border router.""" | ||
|
|
||
| server: str | None | ||
| addresses: list[str] | ||
| neighbours: dict[str, Neighbour] | ||
| thread_version: str | None | ||
| model: str | None | ||
| vendor: str | None | ||
| routes: dict[str, Route] | ||
|
|
||
|
|
||
| class Network(TypedDict): | ||
| """A thread network.""" | ||
|
|
||
| name: str | None | ||
| routers: dict[str, Router] | ||
| prefixes: set[str] | ||
| unexpected_routers: set[str] | ||
|
|
||
|
|
||
| def _get_possible_thread_routes() -> ( | ||
| tuple[dict[str, dict[str, Route]], dict[str, set[str]]] | ||
| ): | ||
| # Build a list of possible thread routes | ||
| # Right now, this is ipv6 /64's that have a gateway | ||
| # We cross reference with zerconf data to confirm which via's are known border routers | ||
| routes: dict[str, dict[str, Route]] = {} | ||
| reverse_routes: dict[str, set[str]] = {} | ||
|
|
||
| with NDB() as ndb: | ||
| for record in ndb.routes: | ||
| # Limit to IPV6 routes | ||
| if record.family != 10: | ||
| continue | ||
| # Limit to /64 prefixes | ||
| if record.dst_len != 64: | ||
| continue | ||
| # Limit to routes with a via | ||
| if not record.gateway and not record.nh_gateway: | ||
| continue | ||
| gateway = record.gateway or record.nh_gateway | ||
| route = routes.setdefault(gateway, {}) | ||
| route[record.dst] = { | ||
| "metrics": record.metrics, | ||
| "priority": record.priority, | ||
| # NM creates "nexthop" routes - a single route with many via's | ||
| # Kernel creates many routes with a single via | ||
| "is_nexthop": record.nh_gateway is not None, | ||
| } | ||
| reverse_routes.setdefault(record.dst, set()).add(gateway) | ||
| return routes, reverse_routes | ||
|
|
||
|
|
||
| def _get_neighbours() -> dict[str, Neighbour]: | ||
| neighbours: dict[str, Neighbour] = {} | ||
|
|
||
| with NDB() as ndb: | ||
| for record in ndb.neighbours: | ||
| neighbours[record.dst] = { | ||
| "lladdr": record.lladdr, | ||
| "state": record.state, | ||
| "probes": record.probes, | ||
| } | ||
|
|
||
| return neighbours | ||
|
|
||
|
|
||
| async def async_get_config_entry_diagnostics( | ||
| hass: HomeAssistant, entry: ConfigEntry | ||
| ) -> dict[str, Any]: | ||
| """Return diagnostics for all known thread networks.""" | ||
|
|
||
| networks: dict[str, Network] = {} | ||
|
|
||
| # Start with all networks that HA knows about | ||
| store = await async_get_store(hass) | ||
| for record in store.datasets.values(): | ||
| if not record.extended_pan_id: | ||
| continue | ||
| network = networks.setdefault( | ||
| record.extended_pan_id, | ||
| { | ||
| "name": record.network_name, | ||
| "routers": {}, | ||
| "prefixes": set(), | ||
| "unexpected_routers": set(), | ||
| }, | ||
| ) | ||
| if mlp := record.dataset.get(MeshcopTLVType.MESHLOCALPREFIX): | ||
| network["prefixes"].add(f"{mlp[0:4]}:{mlp[4:8]}:{mlp[8:12]}:{mlp[12:16]}") | ||
|
|
||
| # Find all routes currently act that might be thread related, so we can match them to | ||
| # border routers as we process the zeroconf data. | ||
| routes, reverse_routes = await hass.async_add_executor_job( | ||
| _get_possible_thread_routes | ||
| ) | ||
|
|
||
| # Find all neighbours | ||
| neighbours = await hass.async_add_executor_job(_get_neighbours) | ||
|
|
||
| aiozc = await zeroconf.async_get_async_instance(hass) | ||
| for data in async_read_zeroconf_cache(aiozc): | ||
| if not data.extended_pan_id: | ||
| continue | ||
|
|
||
| network = networks.setdefault( | ||
| data.extended_pan_id, | ||
| { | ||
| "name": data.network_name, | ||
| "routers": {}, | ||
| "prefixes": set(), | ||
| "unexpected_routers": set(), | ||
| }, | ||
| ) | ||
|
|
||
| if not data.server: | ||
| continue | ||
|
|
||
| router = network["routers"][data.server] = { | ||
| "server": data.server, | ||
| "addresses": data.addresses or [], | ||
| "neighbours": {}, | ||
| "thread_version": data.thread_version, | ||
| "model": data.model_name, | ||
| "vendor": data.vendor_name, | ||
| "routes": {}, | ||
| } | ||
|
|
||
| # For every address this border router hass, see if we have seen | ||
| # it in the route table as a via - these are the routes its | ||
| # announcing via RA | ||
| if data.addresses: | ||
| for address in data.addresses: | ||
| if address in routes: | ||
| router["routes"].update(routes[address]) | ||
|
|
||
| if address in neighbours: | ||
| router["neighbours"][address] = neighbours[address] | ||
|
|
||
| network["prefixes"].update(router["routes"].keys()) | ||
|
|
||
| # Find unexpected via's. | ||
| # Collect all router addresses and then for each prefix, find via's that aren't | ||
| # a known router for that prefix. | ||
| for network in networks.values(): | ||
| routers = set() | ||
|
|
||
| for router in network["routers"].values(): | ||
| routers.update(router["addresses"]) | ||
|
|
||
| for prefix in network["prefixes"]: | ||
| if prefix not in reverse_routes: | ||
| continue | ||
| if ghosts := reverse_routes[prefix] - routers: | ||
| network["unexpected_routers"] = ghosts | ||
|
|
||
| return { | ||
| "networks": networks, | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this base64 encode the value prefixed with
encode-error:, so we know it's not None?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only moved what was already there so I could re-use it, so I don't know if theres any reason for the current behaviour. I'm not opposed to changing it - @emontnemery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's do it in another PR then.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I would like that detail in the diagnostic report, but the same decoder is used by the thread panel and It would be weird there. Probably better to show nothing that show the user garbage? Maybe leave as is for now and we can revisit if we see it in practice? Can always include the raw
TXTrecord (in the diagnostics report only) if we start seeing trash border routers?