Migrate statistics to use timestamp columns#87321
Conversation
This is effectively the same change as home-assistant#84870 but for statistics With ~8 months of statistics data and 10 day purge for states this is expected to reduce the database size by ~30% and improve the speed of selecting data from statistics
|
Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
|
unrelated but if we drop the IX_STATES_EVENT_ID................................ 3794 2.1% We can save another 2.1% |
|
|
This is what is looks like if we drop ix_states_event_id (which won't be used after we migrate all the old rows) |
IX_STATES_ENTITY_ID_LAST_UPDATED_TS takes up more than half of the actual states data |
|
SQLite looks good need to do MySQL and PostgreSQL testing |
|
Uncovered lines are the postgresql and mysql test runners which don't currently report to codecov |
emontnemery
left a comment
There was a problem hiding this comment.
Looks great, just a few comments.
|
Will need rebase on top of #87583 before merging. |
Breaking change
Statistics and energy graphs will be unavailable during the database migration.
Proposed change
This is effectively the same change as #84870 but for statistics. Statistics migrations are expected to take a relatively short time compared to states migrations. (In testing they took less than a minute on fast hardware with a 4 GiB database)
This should address some of the feedback about the energy and statistics graphs being slower than the history graphs on the frontend. The ones that had noticeable delays now appear to load instantly. Additional API optimization is possible to avoid some more data conversions in future PRs. This does not fix energy stats for a full year taking a very long time to load. The hope is to be able to resolve that in #87747.
With ~8 months of statistics data and 10 day purge for states this is expected to reduce the database size by ~30% after the next monthly repack (2nd Sunday of the month) and improve the speed of selecting data from statistics.
Testing before
after:
30.22% decrease
Nearly eliminates python datetime conversion overheads for a single day of stats (not weekly/monthly/yearly .. see #87747). The fetch overhead is much lower as we don't have to do any datetime data conversion at that layer as well which tend to be the most expensive part of row fetches.
Remaining:
Type of change
Additional information
Checklist
black --fast homeassistant tests)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest.requirements_all.txt.Updated by running
python3 -m script.gen_requirements_all..coveragerc.To help with the load of incoming pull requests:
Note: will conflict with #86436 but easy enough to fix whichever one merges firstAdditional conflicts (fixes that have been broken out of this PR that were discovered as part of the process of developing this):
#87581#87583