Separate attrs into another table (reduces database size) by bdraco · Pull Request #68224 · home-assistant/core

bdraco · 2022-03-16T03:08:03Z

Breaking change

Attributes are now stored in a state_attributes table, which stores the same set of attributes once (many to one relationship).

Attributes represent roughly 21% of the database size. (28% if you exclude statistics)
On a few of my production instances attributes ranged from 82-88% duplicates of another set of attributes.
I expect this will reduce the size of the database between 13-16% on average

My CPU history graphs which have a lot of duplicate attributes now load faster in unscientific testing. (134ms instead of 385ms). I expect this to speed up other queries as we now mix in the attributes after filtering most of the time so it saves a bit of I/O as well in some cases.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request: Document state_attributes table data.home-assistant#194

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

No score or internal
🥈 Silver
🥇 Gold
🏆 Platinum

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

probot-home-assistant · 2022-03-16T03:08:09Z

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!
_{^{(message by CodeOwnersMention)}}

bdraco · 2022-03-16T07:04:51Z

Need to handle dupe attributes that happen before commit

Also not write the attributes if it's a dupe

frenck · 2022-03-16T08:15:35Z

I think this is a great idea! Technically, we could optimize this further, if we store each attribute separately (instead of dumping the JSON that contains all every single time).

From that perspective, I also think we could allow marking attributes as "not recorded" or not relevant for recording or something. For example, it is really unnecessary to record the effect_list attribute of lights... (and if you use WLED, like I do, with hundreds of effects; that is quite a lot of wasted space).

bdraco · 2022-03-16T08:52:36Z

I think this is a great idea! Technically, we could optimize this further, if we store each attribute separately (instead of dumping the JSON that contains all every single time).

That was the first approach with this, but I ran into a problem: each attribute value had to be json.dump since the data is opaque to us (An attribute value can be anything that is serializable since attributes are typed dict[str, Any]).

Additionally that meant another table to store state_id, attribute_id, ....

That created more overhead to load them in testing since json.loads had to be called on each one. I tried implementing a cache but even then it still wasn't great.

Finally there were a lot more selects on the database to find attributes, but that did mean no need for hashing. A cache might make this ok

I couldn't come up with a way to implement it that didn't trade performance in a way that seemed worth it.

Maybe there is still a way to make it work though. Probably will give it another go before going this direction.

bdraco · 2022-03-16T20:08:28Z

I tried a few approaches with the single attribute split up and the cost of the additional indexes needed to make the purge run at a reasonable speed exceeded the savings from the deduplication. We have a lot of small attributes that make the overhead of the database storage quickly offset the savings

bdraco · 2022-03-16T22:47:51Z

TODO:

purge shared attributes that are no longer linked to any state
Fix case where there a mixed old and new states in the db and the old states are unseen since the join is missing
update recorder tests since they are doing raw selects from States without attrs
tests for purge

homeassistant/components/recorder/purge.py

bdraco · 2022-03-17T01:20:24Z

Also need to disconnect the rows before deleting

…nt into state_attr_table_poc t stash apply

homeassistant/components/recorder/purge.py

homeassistant/components/logbook/__init__.py

balloob · 2022-03-18T06:20:47Z

One slightly off-topic note:

One of the ideas that Erik and I had is that we want all sensors with a state class in the frontend to use just the statistics table and not be stored in the states/events tables. In that case our statistics/event tables would be a lot smaller and might not require such aggressive optimizations.

bdraco · 2022-03-18T06:25:58Z

One slightly off-topic note:

One of the ideas that Erik and I had is that we want all sensors with a state class in the frontend to use just the statistics table and not be stored in the states/events tables. In that case our statistics/event tables would be a lot smaller and might not require such aggressive optimizations.

That sounds like it would save a lot of space that those tend to be the chatty ones. We could come up with an api to abstract away the access to the data to replace the sql queries that the statistics, stats history, plant, etc integrations.

emontnemery

This looks great, left a few questions though.

homeassistant/components/logbook/__init__.py

homeassistant/components/plant/__init__.py

homeassistant/components/recorder/__init__.py

homeassistant/components/logbook/__init__.py

homeassistant/components/recorder/purge.py

homeassistant/components/plant/__init__.py

homeassistant/components/recorder/models.py

…rectly

homeassistant/components/statistics/sensor.py

bdraco · 2022-03-18T10:21:26Z

Retest looks good

probot-home-assistant bot added core integration: recorder labels Mar 16, 2022

homeassistant added the cla-signed label Mar 16, 2022

bdraco changed the title ~~POC: Separate attributes into another table (reduces database size)~~ POC: Separate attrs into another table (reduces database size) Mar 16, 2022

Separate attrs into another table

bd97dcc

bdraco force-pushed the state_attr_table_poc branch from d430820 to bd97dcc Compare March 16, 2022 22:51

bdraco added 7 commits March 16, 2022 13:34

tweak

aafcea4

tweak

3d5395c

fix queries

9ed711f

adjust

9ae17de

tweak

38e92ab

tweak

386455f

fix purge

0e1c41b

bdraco commented Mar 17, 2022

View reviewed changes

homeassistant/components/recorder/purge.py Outdated Show resolved Hide resolved

bdraco commented Mar 17, 2022

View reviewed changes

homeassistant/components/recorder/purge.py Outdated Show resolved Hide resolved

bdraco commented Mar 17, 2022

View reviewed changes

homeassistant/components/recorder/purge.py Outdated Show resolved Hide resolved

Update homeassistant/components/recorder/purge.py

2220099

bdraco added 5 commits March 16, 2022 15:25

fix purge

36b5235

Merge branch 'state_attr_table_poc' of github.com:bdraco/home-assista…

e9f141d

…nt into state_attr_table_poc t stash apply

fixes

f766fa7

wip

3b96923

group_by is faster

684a7a8

bdraco commented Mar 17, 2022

View reviewed changes

homeassistant/components/recorder/purge.py Outdated Show resolved Hide resolved

balloob reviewed Mar 18, 2022

View reviewed changes

homeassistant/components/logbook/__init__.py Outdated Show resolved Hide resolved

bdraco commented Mar 18, 2022

View reviewed changes

homeassistant/components/logbook/__init__.py Outdated Show resolved Hide resolved

Update homeassistant/components/logbook/__init__.py

a2ad0ac

balloob approved these changes Mar 18, 2022

View reviewed changes

emontnemery approved these changes Mar 18, 2022

View reviewed changes

bdraco commented Mar 18, 2022

View reviewed changes

homeassistant/components/plant/__init__.py Outdated Show resolved Hide resolved

Update homeassistant/components/plant/__init__.py

dd5f0a9

bdraco commented Mar 18, 2022

View reviewed changes

homeassistant/components/plant/__init__.py Outdated Show resolved Hide resolved

bdraco commented Mar 18, 2022

View reviewed changes

homeassistant/components/recorder/models.py Outdated Show resolved Hide resolved

bdraco added 5 commits March 17, 2022 22:36

Update homeassistant/components/recorder/models.py

aae19cb

delete ENABLE_LOAD_HISTORY

fa2a7ef

Add comment about _state_attributes_ids LRU size

d7c5e58

add coverage for missing line in logbook

c4f31a8

add coverage to make sure states without attributes_id are purged cor…

762c015

…rectly

bdraco commented Mar 18, 2022

View reviewed changes

homeassistant/components/statistics/sensor.py Outdated Show resolved Hide resolved

bdraco added 2 commits March 17, 2022 23:06

Update homeassistant/components/statistics/sensor.py

79e3cc6

Merge branch 'dev' into state_attr_table_poc

54b3160

This was referenced Mar 18, 2022

Fix statistics doing I/O in the event loop #68315

Merged

Cache parsing attr in LazyState #68232

Merged

bdraco merged commit 9215702 into home-assistant:dev Mar 18, 2022

bdraco deleted the state_attr_table_poc branch March 18, 2022 10:23

This was referenced Mar 19, 2022

Cache newly written state attribute ids #68355

Merged

Avoid hashing attributes when they are already in the cache #68395

Merged

bdraco added the breaking-change label Mar 20, 2022

frenck mentioned this pull request Mar 20, 2022

Fix migration to schema v25 with Postgresql #68426

Merged

22 tasks

bdraco mentioned this pull request Mar 21, 2022

Pass the no_attributes flag when they are not needed kalkih/mini-graph-card#765

Merged

github-actions bot locked and limited conversation to collaborators Mar 21, 2022

Uh oh!

Conversation

bdraco commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Breaking change

Type of change

Additional information

Checklist

Uh oh!

probot-home-assistant bot commented Mar 16, 2022

Uh oh!

bdraco commented Mar 16, 2022

Uh oh!

frenck commented Mar 16, 2022

Uh oh!

bdraco commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdraco commented Mar 16, 2022

Uh oh!

bdraco commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdraco commented Mar 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

balloob commented Mar 18, 2022

Uh oh!

bdraco commented Mar 18, 2022

Uh oh!

emontnemery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdraco commented Mar 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bdraco commented Mar 16, 2022 •

edited

Loading

bdraco commented Mar 16, 2022 •

edited

Loading

bdraco commented Mar 16, 2022 •

edited

Loading