Training core only model and testing on test stories with entity annotations causes evaluation to crash. #8386

kedz · 2021-04-07T19:46:39Z

Rasa version: 2.4.2

Rasa SDK version (if used & relevant):

Rasa X version (if used & relevant):

Python version: 3.7.6

Operating system (windows, osx, ...): macos, Darwin-20.2.0-x86_64-i386-64bit

Issue:
Training non-e2e core only model and evaluating on a test stories with entity annotations causes evaluation to crash. We would like to be able to add starter pack datasets like the insurance-demo to the model regression tests, but this bug prevents us from doing regression tests on core policies in isolation.

Error (including full traceback):

git clone https://github.com/RasaHQ/insurance-demo.git
cd insurance-demo
rasa train core
rasa test core -s tests/
2021-04-07 15:28:36 INFO     rasa.model  - Loading model models/core-20210407-152812.tar.gz...
/Users/kedz/projects2021/carbonbot/rasa/rasa/utils/train_utils.py:565: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss. It will be set to `True` by default, Rasa Open Source 3.0.0 onwards.
  category=UserWarning,
/Users/kedz/projects2021/carbonbot/rasa/rasa/utils/train_utils.py:537: UserWarning: model_confidence is set to `softmax`. It is recommended to try using `model_confidence=linear_norm` to make it easier to tune fallback thresholds.
  category=UserWarning,
No NLU model found. Using default 'RegexInterpreter' for end-to-end evaluation. If you added actual user messages to your test stories this will likely lead to the tests failing. In that case, you need to train a NLU model first, e.g. using `rasa train`.
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'get_a_quote' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'make_payment' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'greet' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'inform' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'deny' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'new_id_card' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
/Users/kedz/projects2021/carbonbot/rasa/rasa/shared/utils/io.py:96: UserWarning: Issue found in 'tests/test_stories.yml':
Found intent 'affirm' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
Processed story blocks: 100%|██████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 975.31it/s, # trackers=1]
2021-04-07 15:28:48 INFO     rasa.core.test  - Evaluating 6 stories
Progress:
 17%|████████████████████▊                                                                                                        | 1/6 [00:00<00:00, 32.55it/s]
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/carbon/bin/rasa", line 5, in <module>
    main()
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/__main__.py", line 117, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/cli/test.py", line 112, in run_core_test
    use_conversation_test_files=args.e2e,
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/model_testing.py", line 191, in test_core
    **kwargs,
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/utils/common.py", line 309, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/core/test.py", line 845, in test
    completed_trackers, agent, fail_on_prediction_errors, e2e
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/core/test.py", line 753, in _collect_story_predictions
    tracker, agent, fail_on_prediction_errors, use_e2e
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/core/test.py", line 666, in _predict_tracker_actions
    circuit_breaker_tripped,
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/core/test.py", line 551, in _collect_action_executed_predictions
    processor, partial_tracker, prediction
  File "/Users/kedz/projects2021/carbonbot/rasa/rasa/core/test.py", line 520, in _get_e2e_entity_evaluation_result
    tokens = parsed_message.get(TOKENS_NAMES[TEXT])
AttributeError: 'NoneType' object has no attribute 'get'

Command or request that led to error:

Content of configuration file (config.yml) (if relevant):

Content of domain file (domain.yml) (if relevant):

The text was updated successfully, but these errors were encountered:

kedz · 2021-05-12T15:12:59Z

This also happens if you run on data with both of intent label and user text annotated in the test stories. For example in the retail-demo starter pack there are stories like this:

- story: faq
  steps:
  - intent: greet
    user: |-
      hi
  - action: utter_greet
  - intent: faq
    user: |
      what kind of payment you take?
  - action: utter_faq

If you remove the user fields, the tests run fine.

samsucik · 2021-05-18T09:10:34Z

@TyDunn I'd like to lobby for this one because the cost/benefit ratio is, I think, very favourable. Hacking around the bug took me ~3 lines of code (though a clean solution can still be complex, who knows). Once fixed, this would unblock our Core regression testing on the big Sara test set (which is the best Core dataset that we've got right now).

JEM-Mosig · 2021-05-20T09:51:39Z

I also just ran into this issue while working on a bot for IntentTED evaluation.

koernerfelicia · 2021-05-21T07:53:14Z

Would also be helpful for sanity checking carbon bot (not crucial, but nice to have)

TyDunn · 2021-05-21T11:18:01Z

We have estimated and are picking it up next sprint

ancalita · 2021-05-24T14:29:23Z

Hi @wochinge I started a having a look into this issue - I think this has to do with the fact that the processor.interpreter.featurize_message is not implemented neither here nor in RegexInterpreter.

The question I have is whether the wrong method was used here or if an implementation is required for featurize_message? If the latter, I'd need some guidance in what this method does.

wochinge · 2021-05-25T08:01:22Z

@joejuzl implemented that to allow testing the extraction of entities by end-to-end policies (TEDPolicy is currently the only one). It does't make much sense to do the entity evaluation with Core only models at the moment. As the type annotation suggests featurize_message is allowed to return None. I'd just handling for None values here and don't add an EntityEvaluationResult in that case.

ancalita · 2021-05-25T10:03:04Z

@joejuzl what is still unclear to me - if the body of featurize_message is pass, it always returns None - and therefore L555-L558 would never execute. When would featurize_message return a Message? 🤔

joejuzl · 2021-05-25T11:28:19Z

@joejuzl what is still unclear to me - if the body of featurize_message is pass, it always returns None - and therefore L555-L558 would never execute. When would featurize_message return a Message? 🤔

The body is only pass is in NaturalLanguageInterpreter which is a kind of abstract base class for the interpreter. RegexInterpreter is a subclass, but does not override this method because it's a simple default interpreter that is mainly used for testing.
However in Interpreter in rasa/nlu/model.py you can see that featurize_message is implemented. It loops through all the components in the nlu pipeline, adding details to the message along the way. As the tokenizer will be present in the pipeline, the tokens will be added.

ancalita · 2021-05-28T10:04:17Z

Bugfix PR merged 😃

kedz added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Apr 7, 2021

TyDunn added the area:rasa-oss/model-testing Issues focused around testing models (e.g. via `rasa test`) label May 5, 2021

kedz changed the title ~~Training core only model and testing on a test stories with entity annotations causes evaluation to crash.~~ Training core only model and testing on test stories with entity annotations causes evaluation to crash. May 12, 2021

kedz mentioned this issue May 12, 2021

Add Sara and insurance and retail starter packs to regression test configs. #8676

Closed

2 tasks

TyDunn added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label May 21, 2021

TyDunn assigned ancalita May 25, 2021

ancalita mentioned this issue May 27, 2021

Fix bug in evaluation of test stories with entity annotations #8758

Merged

4 tasks

ancalita closed this as completed May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training core only model and testing on test stories with entity annotations causes evaluation to crash. #8386

Training core only model and testing on test stories with entity annotations causes evaluation to crash. #8386

kedz commented Apr 7, 2021

kedz commented May 12, 2021

samsucik commented May 18, 2021 •

edited

Loading

JEM-Mosig commented May 20, 2021

koernerfelicia commented May 21, 2021

TyDunn commented May 21, 2021

ancalita commented May 24, 2021

wochinge commented May 25, 2021

ancalita commented May 25, 2021

joejuzl commented May 25, 2021

ancalita commented May 28, 2021

Training core only model and testing on test stories with entity annotations causes evaluation to crash. #8386

Training core only model and testing on test stories with entity annotations causes evaluation to crash. #8386

Comments

kedz commented Apr 7, 2021

kedz commented May 12, 2021

samsucik commented May 18, 2021 • edited Loading

JEM-Mosig commented May 20, 2021

koernerfelicia commented May 21, 2021

TyDunn commented May 21, 2021

ancalita commented May 24, 2021

wochinge commented May 25, 2021

ancalita commented May 25, 2021

joejuzl commented May 25, 2021

ancalita commented May 28, 2021

samsucik commented May 18, 2021 •

edited

Loading