Utilize specific retrieval intents in story testing #8459

mvielkind · 2021-04-13T13:53:16Z

Description of Problem:
When testing stories containing retrieval intents only the base retrieval intent (i.e. chitchat or faq) are utilized. If you define retrieval sub-intents (i.e. chitchat/ask_name or chitchat/ask_weather) in your story the testing will ignore the added detail of the specific retrieval sub-intents, which can raise confusing and unexpected results from tests.

Overview of the Solution:
In our testing output from the ResponseSelector should be utilized to determine the sub-intent of the retrieval intent instead of just using the top-level intent. The example below demonstrates the current behavior along with the expected behavior.

Examples (if relevant):
This is the test story:

- story: response selector
  steps:
  - user: |
      what is your name?
    intent: chitchat/ask_name
  - action: utter_chitchat/ask_name

After rasa test the story above will output the following:

  steps:
  - intent: chitchat
  - action: utter_chitchat/ask_name  # predicted: utter_chitchat
  - action: action_listen  # predicted: action_default_fallback

The test story contains differing levels of specificity for the retrieval intents than is provided by the test results where the test seems to ignore the specific retrieval intent mentioned in the story.

An expected output for the same test would be the following where the test considers the specific retrieval intents in the test story.

steps:
  - intent: chitchat/ask_name [specificity in intent step for response selector]
  - action: utter_chitchat/ask_name  # predicted: utter_chitchat/goodbye [specificity in prediction]
  - action: action_listen  # predicted: action_default_fallback

Definition of Done:

Story tests utilize the ResponseSelector output

The text was updated successfully, but these errors were encountered:

alwx · 2021-06-07T11:04:19Z

After some investigation I found out that there is no easy way of doing this.
The problem is that the prediction is correct — MessageProcessor.predict_next_action only returns the name of the predicted intent which is indeed utter_chitchat in this particular case:

  - action: utter_chitchat/ask_name  # predicted: utter_chitchat

Since there are no retrieval intents in the domain, the only way to understand that utter_chitchat/ask_name should be used is by executing the utter_chitchat action — in this case, MessageProcessor._run_action will be called, and when executing the action, we also execute ResponseSelector.process method which makes the correct prediction and executes the right action.
However, we don't run actions when doing testing — which is exactly what we're trying to solve here: #8691

The verdict is that we cannot fix this issue before we update how we work with test stories.

wochinge · 2021-06-08T07:28:33Z

@alwx I think we should know the predicted intent and details about any potential selected responses from this line, don't we?

However, we don't run actions when doing testing

I agree that we currently don't run the ActionRetrieveResponse but maybe or extract the logic into a function which we then can re-use? Should really just be a few lines.

in this case, MessageProcessor._run_action will be called, and when executing the action, we also execute ResponseSelector.process method which makes the correct prediction and executes the right action.

This isn't true. The NLU pipeline should add all required information to the UserUttered event. The action simply retrieves that information from the event and selects the correct action then. These are the magic lines.

tmbo · 2021-09-09T08:00:19Z

I don't think this issue is in progress so it shouldn't be in that column on the CSE board @mvielkind

jupyterjazz · 2021-09-17T08:18:26Z

  steps:
  - intent: chitchat
  - action: utter_chitchat/ask_name  # predicted: utter_chitchat
  - action: action_listen  # predicted: action_default_fallback

Just to clarify the second line: it's not failing because the model retrieved some other response (e.g. utter_chitchat/goodbye) but because we weren't even considering retrieval intents for predicted actions (during test stories). So basically it was comparing utter_chitchat and utter_chitchat/ask_name which gave us this mismatch.

This problem will be fixed in #9657

TyDunn closed this as completed Apr 28, 2021

TyDunn reopened this Apr 28, 2021

TyDunn added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. and removed type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR labels May 28, 2021

wochinge added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label May 28, 2021

alwx self-assigned this Jun 1, 2021

TyDunn unassigned alwx Jun 9, 2021

TyDunn added the priority:normal label Jul 15, 2021

dakshvar22 assigned jupyterjazz Sep 16, 2021

jupyterjazz mentioned this issue Sep 17, 2021

Utilizing retrieval intents in test stories #9657

Merged

3 tasks

jupyterjazz closed this as completed Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize specific retrieval intents in story testing #8459

Utilize specific retrieval intents in story testing #8459

mvielkind commented Apr 13, 2021 •

edited by wochinge

Loading

alwx commented Jun 7, 2021

wochinge commented Jun 8, 2021

tmbo commented Sep 9, 2021

jupyterjazz commented Sep 17, 2021 •

edited

Loading

Utilize specific retrieval intents in story testing #8459

Utilize specific retrieval intents in story testing #8459

Comments

mvielkind commented Apr 13, 2021 • edited by wochinge Loading

alwx commented Jun 7, 2021

wochinge commented Jun 8, 2021

tmbo commented Sep 9, 2021

jupyterjazz commented Sep 17, 2021 • edited Loading

mvielkind commented Apr 13, 2021 •

edited by wochinge

Loading

jupyterjazz commented Sep 17, 2021 •

edited

Loading