Skip to content

Start frontend in time for database upgrades#48704

Closed
bdraco wants to merge 36 commits intohome-assistant:devfrom
bdraco:start_frontend_in_time_for_database_upgrade
Closed

Start frontend in time for database upgrades#48704
bdraco wants to merge 36 commits intohome-assistant:devfrom
bdraco:start_frontend_in_time_for_database_upgrade

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented Apr 5, 2021

This PR is superseded by: #49036

After thinking about this for it a bit, it makes sense to do this in one PR since it can now be completely self contained in the recorder integration


Proposed change

If a database check or upgrade is in progress the frontend
would not start soon enough which would leave users wondering
why their instance was not working and restart it again which
lead to their database being corrupted.

I did a bit refactoring to increase test coverage as well as I
wanted to ensure there wasn't a path were the recorder thread
would fail to set the result and block forever.

Screen Shot 2021-04-07 at 9 29 15 AM

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Example entry for configuration.yaml:

# Example configuration.yaml

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 🥈 Silver
  • 🥇 Gold
  • 🏆 Platinum

To help with the load of incoming pull requests:

@probot-home-assistant probot-home-assistant Bot added core small-pr PRs with less than 30 lines. labels Apr 5, 2021
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 5, 2021

Saw users do this as part of the current beta. I think its too late to include this since there isn't enough test runway to feel comfortable about it, but at least it will hopefully mitigate this from happening in the future

@bdraco bdraco marked this pull request as ready for review April 6, 2021 03:09
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 6, 2021

I've been testing with this all day ~100 restarts and no issues. We don't have tests for loading in specific order so I didn't change any tests and everything still passes.

Comment thread homeassistant/bootstrap.py
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 7, 2021

We should probably promote all deps for stage 0 as well like we do for stage 1

@bdraco bdraco marked this pull request as draft April 7, 2021 07:15
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 7, 2021

I think I might throw this away and start over.

Maybe a async_listen_intergration_loaded_or_start

with setup.XXXX for the defer

@bdraco bdraco force-pushed the start_frontend_in_time_for_database_upgrade branch 3 times, most recently from d1fbdbc to abedcf6 Compare April 7, 2021 09:16
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 7, 2021

TODO tomorrow:

  • Additional testing for corrupted databases
    Unit test for async_when_setup_or_start (its already covered by http since that code was moved and made dry, but good to be explicit)
  • Test that recorder blocks on async_block_till_done when not started already

bdraco added 12 commits April 7, 2021 07:14
If a database check or upgrade is in progress the frontend
would not start soon enough which would leave users wondering
why their instance was not working and restart it again which
lead to their database being corrupted.
@bdraco bdraco force-pushed the start_frontend_in_time_for_database_upgrade branch from 65f26bd to 73674c2 Compare April 7, 2021 17:29
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 7, 2021

Need to validate this behaves reasonably when the MySQL database can't connect. Wrong password should be a good test

@bdraco bdraco marked this pull request as ready for review April 8, 2021 00:15
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 8, 2021

All looks good in testing.

I've been slowly increasing the recorder coverage each time and this should get it to ~93%

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 10, 2021

After thinking about this for a while, I think it's better to just start up and watch the size of the event queue to ensure it doesn't exhaust ram. Then we can do database migrations live, without users waiting hours with their instance to be available again.

I think we only need to wait for the database connected event, then let recorder keep going in the background.

We can start a task to watch the size of the queue, and if it gets too big, then we start the starting discarding events. Considering they wouldn't be recorded anyway because of the migration blocking start up that Seems like a good compromise

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 10, 2021

I should build it on top of this though to avoid the review being too big

@bdraco bdraco mentioned this pull request Apr 11, 2021
21 tasks
Comment on lines +550 to +555
# If there is a database upgrade in progress the recorder
# queue can exaust the available memory if we allow stage 2
# to start. We wait until the upgrade is completed before
# starting.
await async_wait_for_recorder_full_startup(hass)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go away in the next PR #49036

RECORDER_BASE_SETUP_TIMEOUT = 60


async def async_wait_for_recorder_full_startup(hass: HomeAssistant) -> None:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go away in the next pr #49036

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Apr 11, 2021

This PR is superseded by: #49036

@bdraco bdraco closed this Apr 11, 2021
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed core small-pr PRs with less than 30 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants