Separate attrs into another table (reduces database size)#68224
Separate attrs into another table (reduces database size)#68224bdraco merged 62 commits intohome-assistant:devfrom
Conversation
|
Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration ( |
|
Need to handle dupe attributes that happen before commit Also not write the attributes if it's a dupe |
|
I think this is a great idea! Technically, we could optimize this further, if we store each attribute separately (instead of dumping the JSON that contains all every single time). From that perspective, I also think we could allow marking attributes as "not recorded" or not relevant for recording or something. For example, it is really unnecessary to record the |
That was the first approach with this, but I ran into a problem: each attribute value had to be Additionally that meant another table to store That created more overhead to load them in testing since Finally there were a lot more selects on the database to find attributes, but that did mean no need for hashing. A cache might make this ok I couldn't come up with a way to implement it that didn't trade performance in a way that seemed worth it. Maybe there is still a way to make it work though. Probably will give it another go before going this direction. |
|
I tried a few approaches with the single attribute split up and the cost of the additional indexes needed to make the purge run at a reasonable speed exceeded the savings from the deduplication. We have a lot of small attributes that make the overhead of the database storage quickly offset the savings |
|
TODO:
|
d430820 to
bd97dcc
Compare
|
Also need to disconnect the rows before deleting |
…nt into state_attr_table_poc t stash apply
|
One slightly off-topic note: One of the ideas that Erik and I had is that we want all sensors with a state class in the frontend to use just the statistics table and not be stored in the states/events tables. In that case our statistics/event tables would be a lot smaller and might not require such aggressive optimizations. |
That sounds like it would save a lot of space that those tend to be the chatty ones. We could come up with an api to abstract away the access to the data to replace the sql queries that the statistics, stats history, plant, etc integrations. |
emontnemery
left a comment
There was a problem hiding this comment.
This looks great, left a few questions though.
|
Retest looks good |
Breaking change
Attributes are now stored in a
state_attributestable, which stores the same set of attributes once (many to one relationship).Attributes represent roughly 21% of the database size. (28% if you exclude statistics)
On a few of my production instances attributes ranged from 82-88% duplicates of another set of attributes.
I expect this will reduce the size of the database between 13-16% on average
My CPU history graphs which have a lot of duplicate attributes now load faster in unscientific testing. (134ms instead of 385ms). I expect this to speed up other queries as we now mix in the attributes after filtering most of the time so it saves a bit of I/O as well in some cases.
Type of change
Additional information
Checklist
black --fast homeassistant tests)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest.requirements_all.txt.Updated by running
python3 -m script.gen_requirements_all..coveragerc.The integration reached or maintains the following Integration Quality Scale:
To help with the load of incoming pull requests: