Live db migrations and recovery#49036
Conversation
37a76d2 to
3e63451
Compare
|
Hey there @home-assistant/core, mind taking a look at this pull request as its been labeled with an integration ( |
d1347ab to
a1ab895
Compare
9edcb35 to
a94153c
Compare
83b62d8 to
f4ba123
Compare
bb05150 to
59132bd
Compare
00f7f2a to
86c2c78
Compare
|
100.00% of diff hit 🎉 |
86c2c78 to
163cd74
Compare
|
Pushed again to cover one more existing line. At this point testing has been solid |
|
Tested this on my large install with Restart |
|
More exhaustive test |
|
I think we should warn that performance may be degraded during the upgrade so they don't wonder why. |
646d603 to
753fa95
Compare
- Database migrations no longer block startup - The migration is started shortly after the started event since they are cpu intensive and we do not want to compete with startup. - Events are queued and processed when the migration is completed - There is a safety to start discarding events if more than 10000 happen before migration is completed - A notification is shown when the migration is in progress "The database is being upgraded. Integrations such as logbook and history that read the database may return inconsistent results until the migration completes." - The database recovery logic can now recover at point after setup including purge, migration, and event insert. In short we can _always_ (hopefully) recover and start a new db without a restart. - The quick check is no longer performed on unclean shutdown since we can always recover live. The `db_integrity_check` option has been deprecated.
753fa95 to
d9d7e91
Compare
|
Being one of the people that waited for about 25 hours during the last update this is a really awesome change! Well done! |
| """Migrate schema to the latest version.""" | ||
| persistent_notification.create( | ||
| self.hass, | ||
| "System performance will temporarily degrade during the database upgrade. Integrations that read the database, such as logbook and history, may return inconsistent results until the upgrade completes.", |
There was a problem hiding this comment.
Should we mention that they should not restart their Home Assistant installation?
We probably want to show this in the frontend somewhere too.
There was a problem hiding this comment.
Yes. That's a good idea 👍
There was a problem hiding this comment.
Maybe fire an event 'recorder_migration_started' 'recorder_migration_ended'. We could then show a warning on the server tools by the restart button that they probably shouldn't do it
There was a problem hiding this comment.
I wonder if shutdown will block actually since the thread won't exit until it finished. I'll test
There was a problem hiding this comment.
System performance will temporarily degrade during the database upgrade. Do not power down or restart the system until the upgrade completes. Integrations that read the database, such as logbook and history, may return inconsistent results until the upgrade completes.
Changed wording to ^
There was a problem hiding this comment.
The webserver will shutdown but the database migration continues and restart is blocked until it finishes because the thread won't join.
There was a problem hiding this comment.
2021-04-12 06:08:27 WARNING (MainThread) [homeassistant.core] Timed out waiting for shutdown stage 1 to complete, the shutdown will continue
There was a problem hiding this comment.
Shutdown blocks until the migration is complete because the thread can't join. I'm going to do another PR to check if migration is in progress and throw when calling SERVICE_HOMEASSISTANT_STOP or SERVICE_HOMEASSISTANT_RESTART so they don't end up in this state unexpectedly.
Thread 213 (idle): "MainThread"
_wait_for_tstate_lock (threading.py:1027)
join (threading.py:1011)
shutdown (concurrent/futures/thread.py:236)
close (homeassistant/runner.py:92)
run (asyncio/runners.py:51)
run (homeassistant/runner.py:126)
main (homeassistant/__main__.py:313)
<module> (homeassistant/__main__.py:321)
_run_code (runpy.py:87)
_run_module_as_main (runpy.py:194)
Thread 237 (idle): "SyncWorker_1"
_wait_for_tstate_lock (threading.py:1027)
join (threading.py:1011)
shutdown (homeassistant/components/recorder/__init__.py:364)
run (concurrent/futures/thread.py:57)
_worker (concurrent/futures/thread.py:80)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
Thread 245 (idle): "Recorder"
migrate_schema (homeassistant/components/recorder/migration.py:60)
_migrate_schema_and_setup_run (homeassistant/components/recorder/__init__.py:523)
run (homeassistant/components/recorder/__init__.py:435)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
Thread 251 (idle): "Thread-8"
loop (paho/mqtt/client.py:1167)
loop_forever (paho/mqtt/client.py:1779)
_thread_main (paho/mqtt/client.py:3452)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
Thread 259 (idle): "Thread-9"
loop (paho/mqtt/client.py:1167)
loop_forever (paho/mqtt/client.py:1779)
_thread_main (paho/mqtt/client.py:3452)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
There was a problem hiding this comment.
Verified everything went down cleanly and restarted successfully once the migration did finish.
There was a problem hiding this comment.
Service block started here #49098
Need to think about how to tell the frontend as it needs to know on reload.
balloob
left a comment
There was a problem hiding this comment.
This looks great ! 🎉
Had a couple of minor comments.
Co-authored-by: Paulus Schoutsen <paulus@home-assistant.io>
|
Thanks. Working on the shutdown block pr now |
Breaking change
Run database migrations in the background after started event
db_integrity_checkoption has been deprecated.__init__.pycoverage increased to 98% 🎊Type of change
Example entry for
configuration.yaml:# Example configuration.yamlAdditional information
Checklist
black --fast homeassistant tests)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest.requirements_all.txt.Updated by running
python3 -m script.gen_requirements_all..coveragerc.The integration reached or maintains the following Integration Quality Scale:
To help with the load of incoming pull requests: