Skip to content

Bump gassist-text to 0.0.10#85782

Merged
balloob merged 5 commits into
home-assistant:devfrom
tronikos:google_assistant_sdk_dep2
Jan 25, 2023
Merged

Bump gassist-text to 0.0.10#85782
balloob merged 5 commits into
home-assistant:devfrom
tronikos:google_assistant_sdk_dep2

Conversation

@tronikos
Copy link
Copy Markdown
Member

@tronikos tronikos commented Jan 12, 2023

Proposed change

tronikos/gassist_text@0.0.8...0.0.10

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@home-assistant home-assistant Bot added cla-signed dependency Pull requests marked as a dependency upgrade dependency-bump Pull requests that update a dependency file integration: google_assistant_sdk small-pr PRs with less than 30 lines. by-code-owner Quality Scale: No score labels Jan 12, 2023
Copy link
Copy Markdown
Member

@frenck frenck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bump shows the use of web scraping, which is not allowed to be used with Core integrations.

CleanShot 2023-01-12 at 23 19 05@2x

Ref: tronikos/gassist_text@0.0.7...0.0.9#diff-40c0dcd23e3f1a696bad3abd37bd877a99b726f654671bf8816141f4c79c3bd8R128

For more information, see: https://github.com/home-assistant/architecture/blob/master/adr/0004-webscraping.md

The existing code does this as well and thus must be adjusted.

../Frenck

@tronikos
Copy link
Copy Markdown
Member Author

tronikos commented Jan 13, 2023

This integration is calling Google Assistant Service via gRPC. The gRPC returns simple text in supplemental_display_text field and HTML in the AssistResponse.ScreenOut.data field. I could stop parsing the HTML field but a very small fraction of queries (under 5%) have the text field populated when all have the HTML field populated. The main functionality of the integration is to send commands and broadcast messages to Google Assistant. In fact in the current version the text response is only logged in debug logs so it would be fine to stop parsing HTML alltogether. But the upcoming release adds conversation agent which would have a very broken experience without parsing the HTML response.

The webscrapping definition per the linked architecture is: "Webscraping is when we use code to mimic a user and log in to a website and get data in Home Assistant. This is usually needed because certain data sources/integrations do not offer an API."

This doesn't quite fall in that definition. The integration is using an API that returns a proto and one of its fields holds HTML. Could there be an exception? The HTML parsing is very simple. It just finds the assistant-card-content div and returns its text. If no exception can be made I will have to revert the conversation agent.

@frenck
Copy link
Copy Markdown
Member

frenck commented Jan 13, 2023

The ADR is in place to ban parsing of HTML output (which is used for display) as data. IMHO, this is a violation of it. Let me explain why:

image

From the codebase you've linked:

/ Output-only Supplemental display text from the Assistant.

It is HTML meant for display, in an HTML-capable/browser-like interface, not for parsing data. Just because the HTML data is provided by an API, doesn't justify the means. Otherwise, I would build very simple API wrappers for fetching parts of websites and restoring all previously removed integrations as well.

Could there be an exception? The HTML parsing is very simple.

I disagree with making an exception. We have removed many more integrations for even simpler parsing. It wouldn't be fair toward others that we have declined and actively removed under this ADR.

../Frenck

@tronikos tronikos force-pushed the google_assistant_sdk_dep2 branch from 7902fe7 to b2b405f Compare January 13, 2023 13:27
@tronikos tronikos changed the title Bump gassist-text to 0.0.9 Bump gassist-text to 0.0.10 Jan 13, 2023
@tronikos
Copy link
Copy Markdown
Member Author

Done. I removed HTML parsing from the library.

@frenck
Copy link
Copy Markdown
Member

frenck commented Jan 16, 2023

Considering the documentation change you've linked to this PR about bubble being empty and considering this part of the code:

image

Isn't an option to just strip HTML instead of parsing it?

../Frenck

@frenck frenck removed their request for review January 16, 2023 20:14
@tronikos
Copy link
Copy Markdown
Member Author

The HTML for some queries includes a "Try saying..." section at the bottom, e.g.
joke
If we just strip the whole HTML, that section will be included. Not necessarily bad. In fact, it's better than an empty bubble. The alternative is what I had, get the relevant div and strip that instead.

Do you want to approve and merge this as is (I need the version bump for #85989 ) and in a follow up PR, depending on what you prefer, either strip the whole HTML or just the relevant div? What's your preferred method for HTML stripping? html.parser.HTMLParser, BeautifulSoup, re, or something else?

@frenck frenck added the second-opinion-wanted Add this label when a reviewer needs a second opinion from another member. label Jan 16, 2023
@frenck
Copy link
Copy Markdown
Member

frenck commented Jan 16, 2023

I'm not sure.

The empty responses/bubbles are just as bad and trying to get rid of that "try saying..." bar will bring use back to interpreting HTML again.

IMHO, we should revert the conversation stuff in this case and not ship that feature.

Will mark the PR for a second opinion and consult some fellow core maintainers.

../Frenck

@tronikos
Copy link
Copy Markdown
Member Author

Sounds good. FYI, the two options are:

  1. strip whole HTML
soup = BeautifulSoup(html_response, "html.parser")
text_response = soup.get_text(separator="\n", strip=True)

or

  1. find and return assistant-card-content div
soup = BeautifulSoup(html_response, "html.parser")
card_content = soup.find("div", id="assistant-card-content")
text_response = card_content.get_text(separator="\n", strip=True)

For the above screenshot, first returns:

What's the smelliest part of a pirate ship?
The poop deck 💩
Try saying...
One more
Random fun

second returns

What's the smelliest part of a pirate ship?
The poop deck 💩

@frenck
Copy link
Copy Markdown
Member

frenck commented Jan 16, 2023

Both involve parsing HTML, which isn't allowed.

@tronikos
Copy link
Copy Markdown
Member Author

I thought earlier you suggested HTML stripping which is option 1. There are alternatives to BeautifulSoup but you can't really do stripping without some form of parsing.

@balloob balloob removed the second-opinion-wanted Add this label when a reviewer needs a second opinion from another member. label Jan 25, 2023
@balloob balloob merged commit a85c4a1 into home-assistant:dev Jan 25, 2023
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

by-code-owner cla-signed dependency Pull requests marked as a dependency upgrade dependency-bump Pull requests that update a dependency file integration: google_assistant_sdk Quality Scale: No score small-pr PRs with less than 30 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants