Skip to content

Fix race in removing entities from the registry#110978

Merged
bdraco merged 11 commits into
devfrom
entity_remove_race
Feb 21, 2024
Merged

Fix race in removing entities from the registry#110978
bdraco merged 11 commits into
devfrom
entity_remove_race

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented Feb 19, 2024

Proposed change

There was a race here because _async_registry_updated would previously run in a task and self.registry_entry would not get unset soon enough and the Entity base class would write unavailable state instead of removing the entity.

self.registry_entry.write_unavailable_state(self.hass)

I've seen this problem happen in production but I never could figure out why until now since its much more likely/reproducible as there are less tasks in play when working on #110967. Any time the _async_registry_updated ran after the async_remove task, the entity wouldn't get removed and the state would get written as unavailable instead because self.registry_entry hasn't been unset yet. To avoid this and not break the api (self.async_removed_from_registry might still need to access self.registry_entry) a new flag is added called self._removed_from_registry

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

There was a race here because `_async_registry_updated` would previously run in a task and `self.registry_entry` would not get unset soon enough so it would write unavailable state instead of removing the entity.

I've seen this problem happen in production but I never could figure out why until now since its much more likely/reproducible as there are less tasks in play.  Any time the `_async_registry_updated` ran after the `async_remove` task, the entity wouldn't get removed and the state would get written as unavailable instead because `self.registry_entry` hasn't been unset yet. To avoid this and not break the api (`self.async_removed_from_registry` might still need to access `self.registry_entry`) a new flag is added called `self._removed_from_registry`
If these tests fail they will not show anything other than
AssertionError without the rewrite

they are currently failing in #110978
and I have not been able to find the cause
@bdraco bdraco mentioned this pull request Feb 19, 2024
20 tasks
@bdraco bdraco marked this pull request as ready for review February 20, 2024 03:40
@bdraco bdraco requested a review from a team as a code owner February 20, 2024 03:40
@bdraco bdraco marked this pull request as draft February 20, 2024 05:02
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 20, 2024

Might have a better solution here. Need to sleep on it

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 20, 2024

Better solution didn't pan out. The test here isn't great. I think we need another one that does the entity registry remove and than immediately simulates a config entry reload by resetting the platform.

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 20, 2024

I think we can mark async_track_entity_registry_updated_event and the device registry to run immediately which should eliminate the call soon and make this completely race safe

Comment thread tests/helpers/test_entity.py
Comment thread tests/helpers/test_entity.py
@bdraco bdraco marked this pull request as ready for review February 20, 2024 15:56
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 21, 2024

I think we can cleanup esphome entity removal quite a lot after this

Comment thread homeassistant/helpers/event.py
@bdraco bdraco merged commit 9c145b5 into dev Feb 21, 2024
@bdraco bdraco deleted the entity_remove_race branch February 21, 2024 02:48
bdraco added a commit that referenced this pull request Feb 21, 2024
Now that #110978 is fixed, we can remove most of entity removal
code in ESPHome as deleting it from the registry will take
care of removing the entity
@github-actions github-actions Bot locked and limited conversation to collaborators Feb 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants