Bump gassist-text to 0.0.10 by tronikos · Pull Request #85782 · home-assistant/core

tronikos · 2023-01-12T21:25:20Z

Proposed change

tronikos/gassist_text@0.0.8...0.0.10

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

frenck

This bump shows the use of web scraping, which is not allowed to be used with Core integrations.

Ref: tronikos/gassist_text@0.0.7...0.0.9#diff-40c0dcd23e3f1a696bad3abd37bd877a99b726f654671bf8816141f4c79c3bd8R128

For more information, see: https://github.com/home-assistant/architecture/blob/master/adr/0004-webscraping.md

The existing code does this as well and thus must be adjusted.

../Frenck

tronikos · 2023-01-13T06:19:23Z

This integration is calling Google Assistant Service via gRPC. The gRPC returns simple text in supplemental_display_text field and HTML in the AssistResponse.ScreenOut.data field. I could stop parsing the HTML field but a very small fraction of queries (under 5%) have the text field populated when all have the HTML field populated. The main functionality of the integration is to send commands and broadcast messages to Google Assistant. In fact in the current version the text response is only logged in debug logs so it would be fine to stop parsing HTML alltogether. But the upcoming release adds conversation agent which would have a very broken experience without parsing the HTML response.

The webscrapping definition per the linked architecture is: "Webscraping is when we use code to mimic a user and log in to a website and get data in Home Assistant. This is usually needed because certain data sources/integrations do not offer an API."

This doesn't quite fall in that definition. The integration is using an API that returns a proto and one of its fields holds HTML. Could there be an exception? The HTML parsing is very simple. It just finds the assistant-card-content div and returns its text. If no exception can be made I will have to revert the conversation agent.

frenck · 2023-01-13T08:34:21Z

The ADR is in place to ban parsing of HTML output (which is used for display) as data. IMHO, this is a violation of it. Let me explain why:

From the codebase you've linked:

/ Output-only Supplemental display text from the Assistant.

It is HTML meant for display, in an HTML-capable/browser-like interface, not for parsing data. Just because the HTML data is provided by an API, doesn't justify the means. Otherwise, I would build very simple API wrappers for fetching parts of websites and restoring all previously removed integrations as well.

Could there be an exception? The HTML parsing is very simple.

I disagree with making an exception. We have removed many more integrations for even simpler parsing. It wouldn't be fair toward others that we have declined and actively removed under this ADR.

../Frenck

tronikos · 2023-01-13T13:31:44Z

Done. I removed HTML parsing from the library.

frenck · 2023-01-16T20:14:06Z

Considering the documentation change you've linked to this PR about bubble being empty and considering this part of the code:

Isn't an option to just strip HTML instead of parsing it?

../Frenck

tronikos · 2023-01-16T22:03:07Z

The HTML for some queries includes a "Try saying..." section at the bottom, e.g.

If we just strip the whole HTML, that section will be included. Not necessarily bad. In fact, it's better than an empty bubble. The alternative is what I had, get the relevant div and strip that instead.

Do you want to approve and merge this as is (I need the version bump for #85989 ) and in a follow up PR, depending on what you prefer, either strip the whole HTML or just the relevant div? What's your preferred method for HTML stripping? html.parser.HTMLParser, BeautifulSoup, re, or something else?

frenck · 2023-01-16T22:14:36Z

I'm not sure.

The empty responses/bubbles are just as bad and trying to get rid of that "try saying..." bar will bring use back to interpreting HTML again.

IMHO, we should revert the conversation stuff in this case and not ship that feature.

Will mark the PR for a second opinion and consult some fellow core maintainers.

../Frenck

tronikos · 2023-01-16T22:49:05Z

Sounds good. FYI, the two options are:

strip whole HTML

soup = BeautifulSoup(html_response, "html.parser")
text_response = soup.get_text(separator="\n", strip=True)

or

find and return assistant-card-content div

soup = BeautifulSoup(html_response, "html.parser")
card_content = soup.find("div", id="assistant-card-content")
text_response = card_content.get_text(separator="\n", strip=True)

For the above screenshot, first returns:

What's the smelliest part of a pirate ship?
The poop deck 💩
Try saying...
One more
Random fun

second returns

What's the smelliest part of a pirate ship?
The poop deck 💩

frenck · 2023-01-16T22:51:23Z

Both involve parsing HTML, which isn't allowed.

tronikos · 2023-01-16T22:54:54Z

I thought earlier you suggested HTML stripping which is option 1. There are alternatives to BeautifulSoup but you can't really do stripping without some form of parsing.

home-assistant Bot added cla-signed dependency Pull requests marked as a dependency upgrade dependency-bump Pull requests that update a dependency file integration: google_assistant_sdk small-pr PRs with less than 30 lines. by-code-owner Quality Scale: No score labels Jan 12, 2023

frenck requested changes Jan 12, 2023

View reviewed changes

tronikos force-pushed the google_assistant_sdk_dep2 branch from 7902fe7 to b2b405f Compare January 13, 2023 13:27

tronikos changed the title ~~Bump gassist-text to 0.0.9~~ Bump gassist-text to 0.0.10 Jan 13, 2023

tronikos requested a review from frenck January 13, 2023 13:35

tronikos mentioned this pull request Jan 14, 2023

Google Assistant SDK: update conversation screenshot and add a note that bubbles might be empty home-assistant/home-assistant.io#25746

Merged

8 tasks

tronikos added 2 commits January 14, 2023 06:49

Bump gassist-text to 0.0.9

db9cbd6

Bump gassist-text to 0.0.10

53877e2

tronikos force-pushed the google_assistant_sdk_dep2 branch from b2b405f to 53877e2 Compare January 14, 2023 06:50

tronikos mentioned this pull request Jan 16, 2023

Google Assistant SDK: support audio response playback #85989

Merged

19 tasks

frenck removed their request for review January 16, 2023 20:14

frenck added the second-opinion-wanted Add this label when a reviewer needs a second opinion from another member. label Jan 16, 2023

tronikos and others added 2 commits January 24, 2023 08:48

Merge branch 'dev' into google_assistant_sdk_dep2

c9d4700

Clarify when it's empty response

0efdd46

balloob removed the second-opinion-wanted Add this label when a reviewer needs a second opinion from another member. label Jan 25, 2023

Merge branch 'dev' into google_assistant_sdk_dep2

65e50d7

balloob merged commit a85c4a1 into home-assistant:dev Jan 25, 2023

github-actions Bot locked and limited conversation to collaborators Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump gassist-text to 0.0.10#85782

Bump gassist-text to 0.0.10#85782
balloob merged 5 commits into
home-assistant:devfrom
tronikos:google_assistant_sdk_dep2

tronikos commented Jan 12, 2023 •

edited

Loading

Uh oh!

frenck left a comment •

edited

Loading

Uh oh!

tronikos commented Jan 13, 2023 •

edited

Loading

Uh oh!

frenck commented Jan 13, 2023 •

edited

Loading

Uh oh!

tronikos commented Jan 13, 2023

Uh oh!

frenck commented Jan 16, 2023

Uh oh!

tronikos commented Jan 16, 2023

Uh oh!

frenck commented Jan 16, 2023

Uh oh!

tronikos commented Jan 16, 2023

Uh oh!

frenck commented Jan 16, 2023

Uh oh!

tronikos commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tronikos commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change

Type of change

Additional information

Checklist

Uh oh!

frenck left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tronikos commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frenck commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tronikos commented Jan 13, 2023

Uh oh!

frenck commented Jan 16, 2023

Uh oh!

tronikos commented Jan 16, 2023

Uh oh!

frenck commented Jan 16, 2023

Uh oh!

tronikos commented Jan 16, 2023

Uh oh!

frenck commented Jan 16, 2023

Uh oh!

tronikos commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tronikos commented Jan 12, 2023 •

edited

Loading

frenck left a comment •

edited

Loading

tronikos commented Jan 13, 2023 •

edited

Loading

frenck commented Jan 13, 2023 •

edited

Loading