sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

tabergma · 2020-10-06T07:38:42Z

Rasa version: 2.0.0rc4

Issue:
Core training fails on the demo bot (rasa init) after changing the config to the one below.

Error (including full traceback):

(rasa) ➜  ~/Repositories/rasa/demo_bot git:(core-entity-roles-groups) ✔ rasa train
The configuration for policies was chosen automatically. It was written into the config file at 'config.yml'.
Training NLU model...
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 69 (7 distinct intents)

2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'deny', 'mood_great', 'mood_unhappy', 'affirm', 'greet', 'bot_challenge', 'goodbye'
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (6 distinct responses)
2020-10-06 09:34:09 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 0 (0 distinct entities)
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component WhitespaceTokenizer
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor
/home/tanja/Repositories/rasa/rasa/shared/utils/io.py:88: UserWarning: No lookup tables or regexes defined in the training data that have a name equal to any entity in the training data. In order for this component to work you need to define valid lookup tables or regexes in the training data.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component LexicalSyntacticFeaturizer
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
/home/tanja/.virtualenv/rasa/lib/python3.6/site-packages/sklearn/feature_extraction/text.py:501: UserWarning: The parameter 'token_pattern' will not be used since 'analyzer' != 'word'
  warnings.warn("The parameter 'token_pattern' will not be used"
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:09 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
/home/tanja/Repositories/rasa/rasa/shared/utils/io.py:88: UserWarning: You specified 'DIET' to train entities, but no entities are present in the training data. Skip training of entities.
Epochs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:18<00:00,  5.46it/s, t_loss=1.504, i_acc=0.986]
2020-10-06 09:34:36 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Starting to train component ResponseSelector
2020-10-06 09:34:36 INFO     rasa.nlu.selectors.response_selector  - Retrieval intent parameter was left to its default value. This response selector will be trained on training examples combining all retrieval intents.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Starting to train component FallbackClassifier
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Finished training component.
2020-10-06 09:34:36 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmp1hnllvoo/nlu'
NLU model training completed.
Training Core model...
Processed story blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 2265.97it/s, # trackers=1]
Processed story blocks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 798.51it/s, # trackers=3]
Processed story blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 269.55it/s, # trackers=12]
Processed story blocks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 83.32it/s, # trackers=39]
Processed rules: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2965.22it/s, # trackers=1]
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 911.94it/s, # actions=12]
Processed actions: 12it [00:00, 9763.66it/s, # examples=12]
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 230.98it/s, # actions=30]
Traceback (most recent call last):
  File "/home/tanja/.virtualenv/rasa/bin/rasa", line 11, in <module>
    load_entry_point('rasa', 'console_scripts', 'rasa')()
  File "/home/tanja/Repositories/rasa/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/tanja/Repositories/rasa/rasa/cli/train.py", line 90, in train
    nlu_additional_arguments=extract_nlu_additional_arguments(args),
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 55, in train
    loop,
  File "/home/tanja/Repositories/rasa/rasa/utils/common.py", line 300, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 110, in train_async
    nlu_additional_arguments=nlu_additional_arguments,
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 207, in _train_async_internal
    old_model_zip_path=old_model,
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 263, in _do_training
    or _interpreter_from_previous_model(old_model_zip_path),
  File "/home/tanja/Repositories/rasa/rasa/train.py", line 409, in _train_core_with_validated_data
    interpreter=interpreter,
  File "/home/tanja/Repositories/rasa/rasa/core/train.py", line 67, in train
    agent.train(training_data, **additional_arguments)
  File "/home/tanja/Repositories/rasa/rasa/core/agent.py", line 724, in train
    training_trackers, self.domain, interpreter=self.interpreter, **kwargs
  File "/home/tanja/Repositories/rasa/rasa/core/policies/ensemble.py", line 189, in train
    trackers_to_train, domain, interpreter=interpreter, **kwargs
  File "/home/tanja/Repositories/rasa/rasa/core/policies/ted_policy.py", line 332, in train
    training_trackers, domain, interpreter, **kwargs
  File "/home/tanja/Repositories/rasa/rasa/core/policies/policy.py", line 165, in featurize_for_training
    training_trackers, domain, interpreter
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 140, in featurize_trackers
    tracker_state_features = self._featurize_states(trackers_as_states, interpreter)
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 73, in _featurize_states
    for tracker_states in trackers_as_states
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 73, in <listcomp>
    for tracker_states in trackers_as_states
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/tracker_featurizers.py", line 71, in <listcomp>
    for state in tracker_states
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/single_state_featurizer.py", line 201, in encode_state
    self._extract_state_features(sub_state, interpreter, sparse=True)
  File "/home/tanja/Repositories/rasa/rasa/core/featurizers/single_state_featurizer.py", line 169, in _extract_state_features
    parsed_message = interpreter.featurize_message(message)
  File "/home/tanja/Repositories/rasa/rasa/core/interpreter.py", line 158, in featurize_message
    result = self.interpreter.featurize_message(message)
  File "/home/tanja/Repositories/rasa/rasa/nlu/model.py", line 418, in featurize_message
    component.process(message, **self.context)
  File "/home/tanja/Repositories/rasa/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 562, in process
    attribute, [message_tokens]
  File "/home/tanja/Repositories/rasa/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 438, in _create_features
    seq_vec = self.vectorizers[attribute].transform(tokens)
  File "/home/tanja/.virtualenv/rasa/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1247, in transform
    self._check_vocabulary()
  File "/home/tanja/.virtualenv/rasa/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 467, in _check_vocabulary
    raise NotFittedError("Vocabulary not fitted or provided")
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided

Command or request that led to error:

rasa train

Content of configuration file (config.yml) (if relevant):

language: en

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexEntityExtractor
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

policies:

Content of domain file (domain.yml) (if relevant):

The text was updated successfully, but these errors were encountered:

tabergma · 2020-10-06T07:41:54Z

If you move the RegexFeaturizer after the CountVectorsFeaturizers, core training works.

hsm207 · 2020-10-06T08:06:28Z

If you move the RegexFeaturizer after the CountVectorsFeaturizers, core training works.

You mean RegexEntityExtractor right?

tabergma · 2020-10-06T08:16:31Z

Yes, if you move the RegexEntityExtractor after the CountVectorsFeaturizers, core training works.

tabergma · 2020-10-06T08:43:54Z

This line is causing the error: https://github.com/RasaHQ/rasa/blob/master/rasa/nlu/model.py#L204

tabergma added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Oct 6, 2020

tabergma self-assigned this Oct 6, 2020

This was referenced Oct 6, 2020

Do not filter training data in model.py but on component side #6930

Closed

Do not filter training data in model.py but on component side #6931

Merged

Ghostvv added this to the 2.0 Rasa Open Source milestone Oct 6, 2020

tmbo closed this as completed Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

tabergma commented Oct 6, 2020

tabergma commented Oct 6, 2020

hsm207 commented Oct 6, 2020

tabergma commented Oct 6, 2020

tabergma commented Oct 6, 2020

sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided #6919

Comments

tabergma commented Oct 6, 2020

tabergma commented Oct 6, 2020

hsm207 commented Oct 6, 2020

tabergma commented Oct 6, 2020

tabergma commented Oct 6, 2020