Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching feature and message fingerprints #11596

Merged
merged 10 commits into from
Oct 4, 2022

Conversation

twerkmeister
Copy link
Contributor

@twerkmeister twerkmeister commented Sep 26, 2022

Proposed changes:

  • Caching fingerprints of messages until message is manipulated
  • Caching fingerprints of features until they are manipulated

In my test on the oos dataset (roughly 15k nlu examples) the changes shaved off around 25 seconds of around 40 seconds in total for the fingerprinting. So roughly a 2/3 reduction in pure fingerprinting time.

I also looked at more general options to speed up caching, but our caching methods are just very generic and work on a very diverse set of inputs, especially arbitrarily nested dicts and lists. I found that my attempts to improve caching there, for example trying to employ frozendicts to cache some of the dictionaries, lead to longer fingerprinting times.

I also considered adding additional safeguards such as a test that checks whether all the classes methods were wrapped with either invalidates_cached_fingerprint or does_not_invalidate_cached_fingerprint. This would prevent developers from accidentally adding a new method that changes Message or Feature state and not invalidate the cached fingerprint. However, it seemed somewhat heavy handed given that it is also not a perfect protection. So I dropped it again.

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@twerkmeister twerkmeister marked this pull request as ready for review September 27, 2022 12:04
@twerkmeister twerkmeister requested a review from a team as a code owner September 27, 2022 12:04
Copy link
Member

@tmbo tmbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

image

@@ -0,0 +1 @@
Caching `Message` and `Features` fingerprints unless they are altered, speeding up fingerprinting considerably.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, can we quantify this for the release briefing cc @chandrikas

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2022

🚀 A preview of the docs have been deployed at the following URL: https://11596--rasahq-docs-rasa-v2.netlify.app/docs/rasa

@twerkmeister twerkmeister merged commit 6254f35 into main Oct 4, 2022
@twerkmeister twerkmeister deleted the ATO-264-fingerprinting-takes-too-long branch October 4, 2022 18:17
twerkmeister added a commit that referenced this pull request Oct 7, 2022
* Caching feature and message fingerprints
twerkmeister added a commit that referenced this pull request Oct 10, 2022
* Caching feature and message fingerprints (#11596)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants