Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilize specific retrieval intents in story testing #8459

Closed
1 task
mvielkind opened this issue Apr 13, 2021 · 4 comments
Closed
1 task

Utilize specific retrieval intents in story testing #8459

mvielkind opened this issue Apr 13, 2021 · 4 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/model-testing Issues focused around testing models (e.g. via `rasa test`) cse-issues effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@mvielkind
Copy link
Contributor

mvielkind commented Apr 13, 2021

Description of Problem:
When testing stories containing retrieval intents only the base retrieval intent (i.e. chitchat or faq) are utilized. If you define retrieval sub-intents (i.e. chitchat/ask_name or chitchat/ask_weather) in your story the testing will ignore the added detail of the specific retrieval sub-intents, which can raise confusing and unexpected results from tests.

Overview of the Solution:
In our testing output from the ResponseSelector should be utilized to determine the sub-intent of the retrieval intent instead of just using the top-level intent. The example below demonstrates the current behavior along with the expected behavior.

Examples (if relevant):
This is the test story:

- story: response selector
  steps:
  - user: |
      what is your name?
    intent: chitchat/ask_name
  - action: utter_chitchat/ask_name

After rasa test the story above will output the following:

  steps:
  - intent: chitchat
  - action: utter_chitchat/ask_name  # predicted: utter_chitchat
  - action: action_listen  # predicted: action_default_fallback

The test story contains differing levels of specificity for the retrieval intents than is provided by the test results where the test seems to ignore the specific retrieval intent mentioned in the story.

An expected output for the same test would be the following where the test considers the specific retrieval intents in the test story.

steps:
  - intent: chitchat/ask_name [specificity in intent step for response selector]
  - action: utter_chitchat/ask_name  # predicted: utter_chitchat/goodbye [specificity in prediction]
  - action: action_listen  # predicted: action_default_fallback

Definition of Done:

  • Story tests utilize the ResponseSelector output
@mvielkind mvielkind added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/model-testing Issues focused around testing models (e.g. via `rasa test`) cse-issues labels Apr 13, 2021
@TyDunn TyDunn closed this as completed Apr 28, 2021
@TyDunn TyDunn reopened this Apr 28, 2021
@TyDunn TyDunn added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. and removed type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR labels May 28, 2021
@wochinge wochinge added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label May 28, 2021
@alwx alwx self-assigned this Jun 1, 2021
@alwx
Copy link
Contributor

alwx commented Jun 7, 2021

After some investigation I found out that there is no easy way of doing this.
The problem is that the prediction is correct — MessageProcessor.predict_next_action only returns the name of the predicted intent which is indeed utter_chitchat in this particular case:

  - action: utter_chitchat/ask_name  # predicted: utter_chitchat

Since there are no retrieval intents in the domain, the only way to understand that utter_chitchat/ask_name should be used is by executing the utter_chitchat action — in this case, MessageProcessor._run_action will be called, and when executing the action, we also execute ResponseSelector.process method which makes the correct prediction and executes the right action.
However, we don't run actions when doing testing — which is exactly what we're trying to solve here: #8691

The verdict is that we cannot fix this issue before we update how we work with test stories.

@wochinge
Copy link
Contributor

wochinge commented Jun 8, 2021

@alwx I think we should know the predicted intent and details about any potential selected responses from this line, don't we?

However, we don't run actions when doing testing

I agree that we currently don't run the ActionRetrieveResponse but maybe or extract the logic into a function which we then can re-use? Should really just be a few lines.

in this case, MessageProcessor._run_action will be called, and when executing the action, we also execute ResponseSelector.process method which makes the correct prediction and executes the right action.

This isn't true. The NLU pipeline should add all required information to the UserUttered event. The action simply retrieves that information from the event and selects the correct action then. These are the magic lines.

@tmbo
Copy link
Member

tmbo commented Sep 9, 2021

I don't think this issue is in progress so it shouldn't be in that column on the CSE board @mvielkind

@jupyterjazz
Copy link
Contributor

jupyterjazz commented Sep 17, 2021

  steps:
  - intent: chitchat
  - action: utter_chitchat/ask_name  # predicted: utter_chitchat
  - action: action_listen  # predicted: action_default_fallback

Just to clarify the second line: it's not failing because the model retrieved some other response (e.g. utter_chitchat/goodbye) but because we weren't even considering retrieval intents for predicted actions (during test stories). So basically it was comparing utter_chitchat and utter_chitchat/ask_name which gave us this mismatch.

This problem will be fixed in #9657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/model-testing Issues focused around testing models (e.g. via `rasa test`) cse-issues effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

6 participants