Skip to content

Add apply_filter attribute to recorder.purge service#45826

Merged
bdraco merged 3 commits into
home-assistant:devfrom
cdce8p:recorder-purge
Mar 12, 2021
Merged

Add apply_filter attribute to recorder.purge service#45826
bdraco merged 3 commits into
home-assistant:devfrom
cdce8p:recorder-purge

Conversation

@cdce8p
Copy link
Copy Markdown
Member

@cdce8p cdce8p commented Feb 1, 2021

Proposed change

This PR adds the apply_filter attribute to the recorder.purge service. The attribute indicates that the purge service should remove all filtered states and events even if they occurred between now and now - purge_days. This feature is intended to be used by a user only, so there is no change to the daily purge.

Rational / Use case

I recently noticed that my production database had quadrupled in size after I missed excluding a few entity_ids. Since I use a Raspberry PI with an SD card, this caused my whole system to crash. After updating my config, I ran a manual purge, to try to reduce the db size. I noticed then, that the filter is only applied when a state or event is being added. I could have used keep_days: 0 or removed the database completely but I just wanted to remove the falsely added entries. I had to remove them manually.

With this change, a user can just run recorder.purge with repack: true and apply_filter: true to remove the entries without loosing their entire history or knowing SQL.

Implementation details

  • apply_filter: true has to be set in a manual service call to recorder.purge. As described earlier, the daily purge doesn't change.
  • To check which states should be removed, I used distinct(States.entity_id).all() and check for each entity if it should be filtered. For events I used distinct(Events.event_type).all().
  • Only states and events between now and now - purge_days are considered. Everything before then will be removed by the existing logic.
  • If a state is to be removed, I also remove the event with the matching event_id. (state_changed events)
  • If an event is to be removed, I also remove the matching state. (for state_changed events)
  • Similar to the existing purge, the deletion happens in small increments, currently set to one hour. If the purge isn't finished, it will return False, which calls the purge method again.
  • I have added extensive integration tests to validate the change.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Example entry for configuration.yaml:

# Example configuration.yaml
recorder:
  exclude:
    entities:
      - sensor.purge
    event_types:
      - timer_out_of_sync


# Service call
recorder.purge:
  apply_filter: true

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 🥈 Silver
  • 🥇 Gold
  • 🏆 Platinum

To help with the load of incoming pull requests:

Comment thread homeassistant/components/recorder/purge.py Outdated
@bdraco
Copy link
Copy Markdown
Member

bdraco commented Feb 16, 2021

Out of time to fully run though this. It going to make it more difficult for someone to fix #44958

I might need to bite the bullet on this and get a MSSQL server setup so I can make the change to fix that before we merge this since it doesn't look like we have a MSSQL dev stepping up to do the work in that issue.

@bdraco bdraco self-requested a review February 16, 2021 07:22
@bdraco
Copy link
Copy Markdown
Member

bdraco commented Feb 17, 2021

I've made the changes to fix MSSQL. Hopefully it will be a small change to make the change sets compatible

https://github.com/home-assistant/core/pull/46678/files

@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Feb 17, 2021

I've made the changes to fix MSSQL. Hopefully it will be a small change to make the change sets compatible

https://github.com/home-assistant/core/pull/46678/files

I'll check later how to best update my PR. Not sure if that will be a small change just yet.

@cdce8p cdce8p force-pushed the recorder-purge branch 3 times, most recently from a6028c9 to 1079b19 Compare February 20, 2021 00:46
@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Feb 20, 2021

@bdraco I've updated the PR to comply with the changes from #44958 #46678 and extended the test cases. It's ready for review.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented Feb 20, 2021

Did you get a chance to test this with MySQL?

@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Feb 20, 2021

Did you get a chance to test this with MySQL?

It took me a moment to set it up, but it works as intended. I also tested it with ON DELETE NO ACTION set for old_state_id.

Comment thread homeassistant/components/recorder/purge.py Outdated
@bdraco
Copy link
Copy Markdown
Member

bdraco commented Feb 20, 2021

Looks good 👍

I want to get another set of 👀 on #46678 and then they should both be good to merge

@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Feb 20, 2021

Looks good 👍

I want to get another set of 👀 on #46678 and then they should both be good to merge

This PR is compatible with with both, the current implementation and #46678. So it should be possible to merge it earlier. I would even suggest to do it that way. The other one is a bit easier to rebase. Should be just adding one indent for the change in purge.py.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented Feb 20, 2021

Merging this one first makes sense as the other one will be easier to fix conflicts.

I've asked for additional reviews.

This was referenced Mar 9, 2021
cdce8p added 3 commits March 11, 2021 18:54
* Always delete states, even if event_id is None
* Remove subsets
* Set old_state_ids to Null before removing states
@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Mar 11, 2021

@bdraco I've rebased and updated this PR. The first two commits include everything from before, including the tests. In the last commit (3961b07), I've updated the logic which should now be much cleaner and easier to understand, in part thanks to your refactoring in #46678. The PR is ready for review now.

@cdce8p cdce8p requested a review from bdraco March 11, 2021 19:12
@bdraco
Copy link
Copy Markdown
Member

bdraco commented Mar 11, 2021

Looks good. I'll do some testing with it as soon as I have some free cycles

@PeteBa
Copy link
Copy Markdown
Contributor

PeteBa commented Mar 11, 2021

Sorry to come late - feel free to ignore - but would a purge_entities service call taking an array of entities be a viable alternative here ? Appreciate they would both get to the same outcome but purge_entities might be more discoverable and cover a broader set of use cases.

I had a similar experience to OP when I tried out an integration that I decided not to go with and then needed to get rid of the unwanted entities. In this use-case, the original proposal would mean editing the recorder yaml, restarting and running purge(apply_filters: true) service vs just running purge_entities(entities).

@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Mar 11, 2021

@PeteBa That might be an idea, although I would imagine that it might get abused in the end. Just as an example, say a user as a default retention period of 7 days but decides to use service calls to remove just specific entities earlier, maybe even with repacking each time. That's additional wear on their storage medium, an SD card in the worst case, and system slow downs.

Using the recorder.exclude is much more deliberate, you need to modify your config and restart after all. You also don't have the issue of forgetting to add the entity_id to the exclude list if you continue to use the integration.

IMO this feature isn't intended for regular use. It works as you would expect but I would only recommend it if you really need it.

@bdraco What do you think?

@PeteBa
Copy link
Copy Markdown
Contributor

PeteBa commented Mar 12, 2021

@cdce8p , thanks for the response ! and I agree with the "only in emergency" usage. Also, with the comment that the apply_filter approach provides that additional opportunity for consideration. I'm not sure that one is more open to abuse as the functionality is near identical and accessible once awareness gets around. Stripping it down, I would suggest purge_entities is a more atomic building block while apply_filters addresses a more specific use-case. Both reasonable.

Either way - and more importantly ! - the functionality in the PR is great and would have saved me a ton of grief a few weeks ago as I was trying to figure out SQL 8)

@bdraco
Copy link
Copy Markdown
Member

bdraco commented Mar 12, 2021

@bdraco What do you think?

It sounds like a great followup PR that @PeteBa could take on once this is merged since it will provide some great building blocks to enable that.

@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Mar 12, 2021

@bdraco What do you think?

It sounds like a great followup PR that @PeteBa could take on once this is merged since it will provide some great building blocks to enable that.

I agree, it should almost be trivial with

def _purge_filtered_states(session: Session, excluded_entity_ids: list[str]) -> None:
"""Remove filtered states and linked events."""

The only thing left would be to decide how to best wire it up. It might make sense to create a new service for it. Maybe it could take a list of entities or an entity filter as argument? That way a user could still use the glob patterns if they want.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented Mar 12, 2021

Tested and working as expected

Copy link
Copy Markdown
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, test pass 👍

Thanks!

@bdraco bdraco merged commit 92852b9 into home-assistant:dev Mar 12, 2021
@cdce8p cdce8p deleted the recorder-purge branch March 12, 2021 07:38
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants