Skip to content

Migrate statistics to use timestamp columns#87321

Merged
bdraco merged 40 commits into
home-assistant:devfrom
bdraco:stats_migration
Feb 9, 2023
Merged

Migrate statistics to use timestamp columns#87321
bdraco merged 40 commits into
home-assistant:devfrom
bdraco:stats_migration

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented Feb 3, 2023

Breaking change

Statistics and energy graphs will be unavailable during the database migration.

Proposed change

This is effectively the same change as #84870 but for statistics. Statistics migrations are expected to take a relatively short time compared to states migrations. (In testing they took less than a minute on fast hardware with a 4 GiB database)

This should address some of the feedback about the energy and statistics graphs being slower than the history graphs on the frontend. The ones that had noticeable delays now appear to load instantly. Additional API optimization is possible to avoid some more data conversions in future PRs. This does not fix energy stats for a full year taking a very long time to load. The hope is to be able to resolve that in #87747.

With ~8 months of statistics data and 10 day purge for states this is expected to reduce the database size by ~30% after the next monthly repack (2nd Sunday of the month) and improve the speed of selecting data from statistics.

Testing before

-rwxrwxrwx  1 bdraco  staff  1082642432 Feb  3 17:05 home-assistant_v2.db

after:

-rwxr-xr-x   1 bdraco  staff  755396608 Feb  3 17:36 home-assistant_v2.db

30.22% decrease

Nearly eliminates python datetime conversion overheads for a single day of stats (not weekly/monthly/yearly .. see #87747). The fetch overhead is much lower as we don't have to do any datetime data conversion at that layer as well which tend to be the most expensive part of row fetches.

Screenshot_2023-02-08_at_5_48_39_PM

Remaining:

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

Note: will conflict with #86436 but easy enough to fix whichever one merges first

Additional conflicts (fixes that have been broken out of this PR that were discovered as part of the process of developing this):

#87581
#87583

This is effectively the same change as home-assistant#84870 but for statistics

With ~8 months of statistics data and 10 day purge for states
this is expected to reduce the database size by ~30% and
improve the speed of selecting data from statistics
@home-assistant
Copy link
Copy Markdown
Contributor

home-assistant Bot commented Feb 3, 2023

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of recorder can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Change the title of the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign recorder Removes the current integration label and assignees on the issue, add the integration domain after the command.

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 3, 2023

unrelated but if we drop the IX_STATES_EVENT_ID................................ 3794 2.1%

We can save another 2.1%

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 3, 2023

*** Page counts for all tables with their indices *****************************

STATES............................................ 93200       50.5% 
STATISTICS........................................ 54219       29.4% 
STATISTICS_SHORT_TERM............................. 26203       14.2% 
STATE_ATTRIBUTES.................................. 6904         3.7% 
EVENTS............................................ 3639         2.0% 
EVENT_DATA........................................ 181          0.098% 
STATISTICS_RUNS................................... 55           0.030% 
STATISTICS_META................................... 16           0.009% 
SQLITE_SCHEMA..................................... 3            0.002% 
RECORDER_RUNS..................................... 2            0.001% 
SCHEMA_CHANGES.................................... 1            0.0% 

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 3, 2023

This is what is looks like if we drop ix_states_event_id (which won't be used after we migrate all the old rows)

STATES............................................ 89406       49.5% 
STATISTICS........................................ 54219       30.0% 
STATISTICS_SHORT_TERM............................. 26203       14.5% 
STATE_ATTRIBUTES.................................. 6904         3.8% 
EVENTS............................................ 3639         2.0% 
EVENT_DATA........................................ 181          0.10% 
STATISTICS_RUNS................................... 55           0.030% 
STATISTICS_META................................... 16           0.009% 
SQLITE_SCHEMA..................................... 3            0.002% 
RECORDER_RUNS..................................... 2            0.001% 
SCHEMA_CHANGES.................................... 1            0.0% 

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 3, 2023

*** Page counts for all tables and indices separately *************************

STATES............................................ 39402       21.8% 
STATISTICS........................................ 29150       16.1% 
IX_STATES_ENTITY_ID_LAST_UPDATED_TS............... 19517       10.8% 
STATISTICS_SHORT_TERM............................. 13748        7.6% 
IX_STATES_CONTEXT_ID.............................. 13663        7.6% 
IX_STATISTICS_STATISTIC_ID_START_TS............... 10004        5.5% 
IX_STATISTICS_START_TS............................ 8249         4.6% 
IX_STATES_LAST_UPDATED_TS......................... 6832         3.8% 
IX_STATISTICS_METADATA_ID......................... 6816         3.8% 
STATE_ATTRIBUTES.................................. 6329         3.5% 
IX_STATES_OLD_STATE_ID............................ 5220         2.9% 
IX_STATISTICS_SHORT_TERM_STATISTIC_ID_START_TS.... 4916         2.7% 
IX_STATES_ATTRIBUTES_ID........................... 4772         2.6% 
IX_STATISTICS_SHORT_TERM_START_TS................. 4092         2.3% 
IX_STATISTICS_SHORT_TERM_METADATA_ID.............. 3447         1.9% 

IX_STATES_ENTITY_ID_LAST_UPDATED_TS takes up more than half of the actual states data

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 4, 2023

SQLite looks good

need to do MySQL and PostgreSQL testing

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 8, 2023

Uncovered lines are the postgresql and mysql test runners which don't currently report to codecov

@bdraco bdraco mentioned this pull request Feb 8, 2023
19 tasks
@emontnemery emontnemery closed this Feb 9, 2023
@emontnemery emontnemery reopened this Feb 9, 2023
Copy link
Copy Markdown
Contributor

@emontnemery emontnemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a few comments.

Comment thread homeassistant/components/recorder/db_schema.py Outdated
Comment thread homeassistant/components/recorder/migration.py Outdated
Comment thread homeassistant/components/recorder/migration.py
Comment thread homeassistant/components/recorder/util.py
Comment thread homeassistant/components/recorder/db_schema.py Outdated
Comment thread homeassistant/components/recorder/db_schema.py Outdated
Comment thread homeassistant/components/recorder/migration.py Outdated
Comment thread homeassistant/components/recorder/migration.py Outdated
@emontnemery emontnemery closed this Feb 9, 2023
@emontnemery emontnemery reopened this Feb 9, 2023
@emontnemery emontnemery closed this Feb 9, 2023
@emontnemery emontnemery reopened this Feb 9, 2023
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 9, 2023

Will need rebase on top of #87583 before merging.

@bdraco bdraco marked this pull request as draft February 9, 2023 17:38
@bdraco bdraco marked this pull request as ready for review February 9, 2023 18:24
@bdraco bdraco merged commit abf0c87 into home-assistant:dev Feb 9, 2023
@bdraco bdraco deleted the stats_migration branch February 9, 2023 18:24
AlePerla pushed a commit to AlePerla/homeassistant_core that referenced this pull request Feb 10, 2023
@github-actions github-actions Bot locked and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants