Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial orjson support take 2 #72847

Closed
wants to merge 15 commits into from
Closed

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Jun 1, 2022

Replaced by #73849

Still need to work out problem building wheels which may be solved once #73830 and #73628 merge, but I'm keeping this PR fresh since its what I am running on my production in case anyone else wants to do testing.

--

Redux of #72754 / #32153 Now possible since the following is solved:
ijl/orjson#220 (comment)

This implements orjson where we use our default encoder. This does not implement orjson where ExtendedJSONEncoder is used as these areas tend to be called far less frequently. If its desired, this could be done in a followup, but it seemed like a case of diminishing returns (except maybe for large diagnostics files, or traces, but those are not expected to be downloaded frequently).

Areas where this makes a perceptible difference:

  • Anything that subscribes to entities (Initial subscribe_entities payload)
  • Initial download of registries on first connection / restore
  • History queries
  • Saving states to the database
  • Large logbook queries
  • http views that use .json
  • templates that use value_json
  • Anything that subscribes to events (appdaemon)

Cavets:
orjson supports serializing dataclasses natively (and much faster) which
eliminates the need to implement as_dict in many places
when the data is already in a dataclass. This works
well as long as all the data in the dataclass can also
be serialized. I audited all places where we have an as_dict
for a dataclass and found only backups needs to be adjusted (support for Path needed to be added for backups). I was a little bit worried about SensorExtraStoredData with Decimal but it all seems to work out from since it converts it before it gets to the json encoding cc @dgomes

If it turns out to be a problem we can disable this
with option |= orjson.OPT_PASSTHROUGH_DATACLASS and it
will fallback to as_dict

Its quite impressive for history queries
Screen_Shot_2022-05-30_at_23_46_30

Breaking change

Proposed change

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 🥈 Silver
  • 🥇 Gold
  • 🏆 Platinum

To help with the load of incoming pull requests:

@probot-home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (history) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)

@probot-home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)

@probot-home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (logbook) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)

@probot-home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (energy) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)

@probot-home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (websocket_api) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)

@marcelveldt
Copy link
Member

marcelveldt commented Jun 2, 2022

I have my fare share of experience with orjson with a project that is moving a lot of data round (talk million of records).
It's indeed really fast and so far I've never ran into compatibility issues. That project is hosted on Azure so no issues with getting a wheel. Later I tried applying it to another OSS/HA project and ran into issues building the wheels, especially for aarch64. If you can solve the issues with building the wheels for all HA archs this will be a really great speed improvement.

Nice job on this!

@marcelveldt
Copy link
Member

FYI: If you're interested in another approach (if creating the wheels keeps an issue), you may want to look into another really great project: mashumaro

Despite the bit weird name it is really great and stable and basically it speeds up the creation of json from dataclasses.

@bdraco
Copy link
Member Author

bdraco commented Jun 10, 2022

No changes, rebase for conflicts

Still need to work out problem building wheels

--

Redux of home-assistant#72754 / home-assistant#32153 Now possible since the following is solved:
ijl/orjson#220 (comment)

This implements orjson where we use our default encoder.  This does not implement orjson where `ExtendedJSONEncoder` is used as these areas tend to be called far less frequently.  If its desired, this could be done in a followup, but it seemed like a case of diminishing returns (except maybe for large diagnostics files, or traces, but those are not expected to be downloaded frequently).

Areas where this makes a perceptible difference:
- Anything that subscribes to entities (Initial subscribe_entities payload)
- Initial download of registries on first connection / restore
- History queries
- Saving states to the database
- Large logbook queries
- Anything that subscribes to events (appdaemon)

Cavets:
orjson supports serializing dataclasses natively (and much faster) which
eliminates the need to implement `as_dict` in many places
when the data is already in a dataclass. This works
well as long as all the data in the dataclass can also
be serialized. I audited all places where we have an `as_dict`
for a dataclass and found only backups needs to be adjusted (support for `Path` needed to be added for backups).  I was a little bit worried about `SensorExtraStoredData` with `Decimal` but it all seems to work out from since it converts it before it gets to the json encoding cc @dgomes

If it turns out to be a problem we can disable this
with option |= [orjson.OPT_PASSTHROUGH_DATACLASS](https://github.com/ijl/orjson#opt_passthrough_dataclass) and it
will fallback to `as_dict`

Its quite impressive for history queries
<img width="1271" alt="Screen_Shot_2022-05-30_at_23_46_30" src="https://user-images.githubusercontent.com/663432/171145699-661ad9db-d91d-4b2d-9c1a-9d7866c03a73.png">
@bdraco
Copy link
Member Author

bdraco commented Jun 10, 2022

Bumped to 3.7.2

@bdraco
Copy link
Member Author

bdraco commented Jun 10, 2022

FYI: If you're interested in another approach (if creating the wheels keeps an issue), you may want to look into another really great project: mashumaro

Despite the bit weird name it is really great and stable and basically it speeds up the creation of json from dataclasses.

Thanks for the link. This one is less about dataclasses and more about overall json performance. Every UI that pulls significant data from the websocket is noticeably snappier with orjson. Right now we are on hold since a new wheel build system is coming which will hopefully solve this. If it doesn't I'm committed to figuring out a solution to get this working as the improvements to the UX are just too good to leave on the table.

@bdraco
Copy link
Member Author

bdraco commented Jun 15, 2022

We can apply this to the http view response json as well as thats a bottleneck as well based on the profiles provided by @Mariusthvdb

@bdraco
Copy link
Member Author

bdraco commented Jun 15, 2022

Switching the views made a huge difference in the response time of /api/config/config_entries/entry?type=integration

@bdraco
Copy link
Member Author

bdraco commented Jun 17, 2022

I added some additional tweaks to smooth out some more of the hotspots in the profile @Mariusthvdb sent

@bdraco
Copy link
Member Author

bdraco commented Jun 17, 2022

Both @Mariusthvdb and @gieljnssns 's profiles show that aiounifi could see a significant benefit from this as well once we have support for orjson in core.

@Mariusthvdb
Copy link
Contributor

yes, and I wonder if here_travel_time shows up in those profilers, because it is a long term trouble maker, and currently gone worse. Author is on it, but I thought it might be good to know related to this issue: #73632 (comment)

@bdraco
Copy link
Member Author

bdraco commented Jun 18, 2022

yes, and I wonder if here_travel_time shows up in those profilers, because it is a long term trouble maker, and currently gone worse. Author is on it, but I thought it might be good to know related to this issue: #73632 (comment)

It will only show up in a py-spy since the updates run in the executor

@bdraco bdraco closed this Jun 22, 2022
@bdraco bdraco mentioned this pull request Jun 22, 2022
22 tasks
@github-actions github-actions bot locked and limited conversation to collaborators Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants