Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

Closed
tabergma opened this issue Oct 6, 2020 · 4 comments · Fixed by #6931
Closed

sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

tabergma opened this issue Oct 6, 2020 · 4 comments · Fixed by #6931
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@tabergma
Copy link
Contributor

tabergma commented Oct 6, 2020

Rasa version: 2.0.0rc4

Issue:
Core training fails on the demo bot (rasa init) after changing the config to the one below.

Error (including full traceback):

(rasa) ➜  ~/Repositories/rasa/demo_bot git:(core-entity-roles-groups) ✔ rasa train
The configuration for policies was chosen automatically. It was written into the config file at 'config.yml'.
Training NLU model...
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 69 (7 distinct intents)

2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'deny', 'mood_great', 'mood_unhappy', 'affirm', 'greet', 'bot_challenge', 'goodbye'
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (6 distinct responses)
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 0 (0 distinct entities)
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component WhitespaceTokenizer
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor
/home/tanja/Repositories/rasa/rasa/shared/utils/io.py:88: UserWarning: No lookup tables or regexes defined in the training data that have a name equal to any entity in the training data. In order for this component to work you need to define valid lookup tables or regexes in the training data.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component LexicalSyntacticFeaturizer
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
/home/tanja/.virtualenv/rasa/lib/python3.6/site-packages/sklearn/feature_extraction/text.py:501: UserWarning: The parameter 'token_pattern' will not be used since 'analyzer' != 'word'
  warnings.warn("The parameter 'token_pattern' will not be used"
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
/home/tanja/Repositories/rasa/rasa/shared/utils/io.py:88: UserWarning: You specified 'DIET' to train entities, but no entities are present in the training data. Skip training of entities.
Epochs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:18<00:00,  5.46it/s, t_loss=1.504, i_acc=0.986]
2020-10-06 09:34:36 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Starting to train component ResponseSelector
2020-10-06 09:34:36 INFO     rasa.nlu.selectors.response_selector  - Retrieval intent parameter was left to its default value. This response selector will be trained on training examples combining all retrieval intents.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Starting to train component FallbackClassifier
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmp1hnllvoo/nlu'
NLU model training completed.
Training Core model...
Processed story blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 2265.97it/s, # trackers=1]
Processed story blocks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 798.51it/s, # trackers=3]
Processed story blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 269.55it/s, # trackers=12]
Processed story blocks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 83.32it/s, # trackers=39]
Processed rules: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2965.22it/s, # trackers=1]
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 911.94it/s, # actions=12]
Processed actions: 12it [00:00, 9763.66it/s, # examples=12]
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 230.98it/s, # actions=30]
Traceback (most recent call last):
  File "/home/tanja/.virtualenv/rasa/bin/rasa", line 11, in <module>
    load_entry_point('rasa', 'console_scripts', 'rasa')()
  File "/home/tanja/Repositories/rasa/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/tanja/Repositories/rasa/rasa/cli/train.py", line 90, in train
    nlu_additional_arguments=extract_nlu_additional_arguments(args),
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 55, in train
    loop,
  File "/home/tanja/Repositories/rasa/rasa/utils/common.py", line 300, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 110, in train_async
    nlu_additional_arguments=nlu_additional_arguments,
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 207, in _train_async_internal
    old_model_zip_path=old_model,
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 263, in _do_training
    or _interpreter_from_previous_model(old_model_zip_path),
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 409, in _train_core_with_validated_data
    interpreter=interpreter,
  File "/home/tanja/Repositories/rasa/rasa/core/train.py", line 67, in train
    agent.train(training_data, **additional_arguments)
  File "/home/tanja/Repositories/rasa/rasa/core/agent.py", line 724, in train
    training_trackers, self.domain, interpreter=self.interpreter, **kwargs
  File "/home/tanja/Repositories/rasa/rasa/core/policies/ensemble.py", line 189, in train
    trackers_to_train, domain, interpreter=interpreter, **kwargs
  File "/home/tanja/Repositories/rasa/rasa/core/policies/ted_policy.py", line 332, in train
    training_trackers, domain, interpreter, **kwargs
  File "/home/tanja/Repositories/rasa/rasa/core/policies/policy.py", line 165, in featurize_for_training
    training_trackers, domain, interpreter
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 140, in featurize_trackers
    tracker_state_features = self._featurize_states(trackers_as_states, interpreter)
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 73, in _featurize_states
    for tracker_states in trackers_as_states
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 73, in <listcomp>
    for tracker_states in trackers_as_states
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 71, in <listcomp>
    for state in tracker_states
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/single_state_featurizer.py", line 201, in encode_state
    self._extract_state_features(sub_state, interpreter, sparse=True)
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/single_state_featurizer.py", line 169, in _extract_state_features
    parsed_message = interpreter.featurize_message(message)
  File "/home/tanja/Repositories/rasa/rasa/core/interpreter.py", line 158, in featurize_message
    result = self.interpreter.featurize_message(message)
  File "/home/tanja/Repositories/rasa/rasa/nlu/model.py", line 418, in featurize_message
    component.process(message, **self.context)
  File "/home/tanja/Repositories/rasa/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 562, in process
    attribute, [message_tokens]
  File "/home/tanja/Repositories/rasa/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 438, in _create_features
    seq_vec = self.vectorizers[attribute].transform(tokens)
  File "/home/tanja/.virtualenv/rasa/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1247, in transform
    self._check_vocabulary()
  File "/home/tanja/.virtualenv/rasa/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 467, in _check_vocabulary
    raise NotFittedError("Vocabulary not fitted or provided")
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided

Command or request that led to error:

rasa train

Content of configuration file (config.yml) (if relevant):

language: en

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexEntityExtractor
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

policies:

Content of domain file (domain.yml) (if relevant):

@tabergma tabergma added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Oct 6, 2020
@tabergma
Copy link
Contributor Author

tabergma commented Oct 6, 2020

If you move the RegexFeaturizer after the CountVectorsFeaturizers, core training works.

@hsm207
Copy link
Contributor

hsm207 commented Oct 6, 2020

If you move the RegexFeaturizer after the CountVectorsFeaturizers, core training works.

You mean RegexEntityExtractor right?

@tabergma
Copy link
Contributor Author

tabergma commented Oct 6, 2020

Yes, if you move the RegexEntityExtractor after the CountVectorsFeaturizers, core training works.

@tabergma
Copy link
Contributor Author

tabergma commented Oct 6, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
4 participants