Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile training data loading #7944

Closed
wants to merge 6 commits into from
Closed

Conversation

wochinge
Copy link
Contributor

@wochinge wochinge commented Feb 11, 2021

Proposed changes:

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@wochinge wochinge mentioned this pull request Feb 12, 2021
3 tasks

@classmethod
def is_key_in_yaml2(cls, file_path: Union[Text, Path], *keys: Text) -> bool:
with open(file_path) as f:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides the nesting and the extra function this is actually a clean fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we should obviously also test the colon, e.g. f"{key}:"

rasa/model.py Outdated
# FINGERPRINT_PROJECT: project_fingerprint(),
# FINGERPRINT_NLU_DATA_KEY: nlu_data.fingerprint(),
# FINGERPRINT_NLU_LABELS_KEY: nlu_data.label_fingerprint(),
# FINGERPRINT_STORIES_KEY: stories.fingerprint(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumping all stories to yaml just to get the hash from the string is expensive

Comment on lines 331 to 333
# fix_yaml_loader()

replace_environment_variables()
# replace_environment_variables()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this brought down the reading times with woz from 1h to 3.6 minutes. In my opinion we only need to replace env variables for config files (endpoints.yml, config.yml, credentials.yml, but not for stories / nlu)

@@ -328,9 +328,9 @@ def read_yaml(content: Text, reader_type: Union[Text, List[Text]] = "safe") -> A
Raises:
ruamel.yaml.parser.ParserError: If there was an error when parsing the YAML.
"""
fix_yaml_loader()
# fix_yaml_loader()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea if we still need this. It's ancient.

@wochinge wochinge mentioned this pull request Feb 18, 2021
4 tasks
@wochinge wochinge force-pushed the training-data-loading-tests branch from 7b4ef39 to 66ff486 Compare March 15, 2021 15:17
@wochinge wochinge closed this Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate rasahq/rasa#7272
1 participant