Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New core featurization #6296

Merged
merged 271 commits into from
Sep 7, 2020
Merged

New core featurization #6296

merged 271 commits into from
Sep 7, 2020

Conversation

evgeniiaraz
Copy link
Contributor

@evgeniiaraz evgeniiaraz commented Jul 29, 2020

Proposed changes:

  • Modified core states -- from a dictionary to a dictionary of dictionaries; (e.g., {intent_greet: 1.0, prev_utter_greet :1.0} --> USER: {intent: greet}, PREVIOUS_ACTION: {action_name: utter_greet})
  • Moved rasa.core.featurizers into a separate package
  • Core State Featurization:
    -- SingleStateFeaturizer uses NLU Interpreter (if provided) to featurize ACTION_NAME, ACTION_TEXT, INTENT, TEXT
    -- Featurization of each attribute stored in Features object;
    -- SLOTS, ENTITIES, INTENT and ACTION_NAME are sparse;
    -- LabelTokenizerSingleStateFeaturizer is deprecated;
  • Moved Features rasa.utils.features (from rasa.nlu.featurizers.featurizer )
  • RasaModelData -- changes to store, add and pad data in [dict[dict]]
  • RasaModel -- changes to accommodate to dict[dict] RasaModelData
  • Created TransformerRasaModel -- a RasaModel with methods shared by DIET and TED; it's an abstract class containing helper methods for transformer sequence models.
  • TED -- changed to process sparse features and accept both text and name features
  • TED -- instead of recompute label embeddings, gather them from all_labels_embed using index

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

rasa/core/domain.py Outdated Show resolved Hide resolved
rasa/core/domain.py Outdated Show resolved Hide resolved
rasa/core/featurizers.py Outdated Show resolved Hide resolved
rasa/core/featurizers.py Outdated Show resolved Hide resolved
rasa/core/featurizers.py Outdated Show resolved Hide resolved
@Ghostvv
Copy link
Contributor

Ghostvv commented Jul 31, 2020

I think UserUttered.as_story_string() should be updated

changelog/6296.removal.md Outdated Show resolved Hide resolved
@wochinge
Copy link
Contributor

wochinge commented Sep 4, 2020

Training breaks for me with the rasa init project:

Traceback (most recent call last):
  File "/Users/tobias/.pyenv/versions/rasa2.0/bin/rasa", line 11, in <module>
    load_entry_point('rasa', 'console_scripts', 'rasa')()
  File "/Users/tobias/Workspace/stack/rasa/__main__.py", line 109, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/tobias/Workspace/stack/rasa/cli/train.py", line 77, in train
    nlu_additional_arguments=extract_nlu_additional_arguments(args),
  File "/Users/tobias/Workspace/stack/rasa/train.py", line 53, in train
    nlu_additional_arguments=nlu_additional_arguments,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/Users/tobias/Workspace/stack/rasa/train.py", line 109, in train_async
    nlu_additional_arguments=nlu_additional_arguments,
  File "/Users/tobias/Workspace/stack/rasa/train.py", line 202, in _train_async_internal
    old_model_zip_path=old_model,
  File "/Users/tobias/Workspace/stack/rasa/train.py", line 258, in _do_training
    or _interpreter_from_previous_model(old_model_zip_path),
  File "/Users/tobias/Workspace/stack/rasa/train.py", line 404, in _train_core_with_validated_data
    interpreter=interpreter,
  File "/Users/tobias/Workspace/stack/rasa/core/train.py", line 66, in train
    agent.train(training_data, **additional_arguments)
  File "/Users/tobias/Workspace/stack/rasa/core/agent.py", line 718, in train
    training_trackers, self.domain, interpreter=self.interpreter, **kwargs
  File "/Users/tobias/Workspace/stack/rasa/core/policies/ensemble.py", line 203, in train
    trackers_to_train, domain, interpreter=interpreter, **kwargs
  File "/Users/tobias/Workspace/stack/rasa/core/policies/ted_policy.py", line 323, in train
    training_trackers, domain, interpreter, **kwargs
  File "/Users/tobias/Workspace/stack/rasa/core/policies/policy.py", line 151, in featurize_for_training
    training_trackers, domain, interpreter
  File "/Users/tobias/Workspace/stack/rasa/core/featurizers/tracker_featurizers.py", line 125, in featurize_trackers
    tracker_state_features = self._featurize_states(trackers_as_states, interpreter)
  File "/Users/tobias/Workspace/stack/rasa/core/featurizers/tracker_featurizers.py", line 58, in _featurize_states
    for tracker_states in trackers_as_states
  File "/Users/tobias/Workspace/stack/rasa/core/featurizers/tracker_featurizers.py", line 58, in <listcomp>
    for tracker_states in trackers_as_states
  File "/Users/tobias/Workspace/stack/rasa/core/featurizers/tracker_featurizers.py", line 56, in <listcomp>
    for state in tracker_states
  File "/Users/tobias/Workspace/stack/rasa/core/featurizers/single_state_featurizer.py", line 196, in encode_state
    self._extract_state_features(sub_state, interpreter, sparse=True)
  File "/Users/tobias/Workspace/stack/rasa/core/featurizers/single_state_featurizer.py", line 164, in _extract_state_features
    parsed_message = interpreter.featurize_message(message)
  File "/Users/tobias/Workspace/stack/rasa/core/interpreter.py", line 295, in featurize_message
    result = self.interpreter.featurize_message(message)
  File "/Users/tobias/Workspace/stack/rasa/nlu/model.py", line 416, in featurize_message
    component.process(message, **self.context)
  File "/Users/tobias/Workspace/stack/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 549, in process
    attribute, [message_tokens]
  File "/Users/tobias/Workspace/stack/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 429, in _create_features
    seq_vec = self.vectorizers[attribute].transform(tokens)
  File "/Users/tobias/.pyenv/versions/3.7.8/envs/rasa2.0/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1247, in transform
    self._check_vocabulary()
  File "/Users/tobias/.pyenv/versions/3.7.8/envs/rasa2.0/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 467, in _check_vocabulary
    raise NotFittedError("Vocabulary not fitted or provided")
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided

@evgeniiaraz
Copy link
Contributor Author

@wochinge this is because we haven't reverted the reading of action_names yet, right?

@Ghostvv
Copy link
Contributor

Ghostvv commented Sep 4, 2020

seems like the right error, CVF was not trained for action_name

@wochinge
Copy link
Contributor

wochinge commented Sep 4, 2020

I updated my PR with the latest state of this PR. feel free to merge once the tests pass. Sorry, completely forgot about the PR.

@wochinge
Copy link
Contributor

wochinge commented Sep 4, 2020

Btw, model size and training time seem comparable 👍

@Ghostvv
Copy link
Contributor

Ghostvv commented Sep 4, 2020

we didn't roll out the big TED yet, so the difference model-wise is addition of sparse-to-dense layers

@Ghostvv
Copy link
Contributor

Ghostvv commented Sep 5, 2020

@wochinge could it be that test_train.py fails because of that (rasa train doesn't work, due to importer)?

@Ghostvv
Copy link
Contributor

Ghostvv commented Sep 7, 2020

the failing windows test seems to be unrelated to this PR, so I'll merge it

@Ghostvv Ghostvv merged commit e6cbce5 into e2e Sep 7, 2020
@Ghostvv Ghostvv deleted the new_core_featurization branch September 7, 2020 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants