Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention weight logging #5673

Merged
merged 145 commits into from
Jan 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
7861b93
Early implementation of attention weight logging
Apr 20, 2020
d0f6585
Merge branch 'master' into johannes-73
Aug 25, 2020
ffc19af
Merge branch 'master' into johannes-73
Sep 4, 2020
3364f34
Move tensorboard out of layer
Sep 4, 2020
55b6ce4
Merge branch 'master' into johannes-73
Oct 12, 2020
f773a16
Implement png output for DIET
Oct 12, 2020
1add963
Add --diagnostics option
Oct 16, 2020
28a2f7a
Merge branch 'master' into johannes-73
Oct 16, 2020
ba30d9b
Move constants
Oct 16, 2020
12bfc4c
Use tensorboard config instead of flag
Oct 16, 2020
1816857
Remove arg again
Oct 16, 2020
aec6d43
Merge branch 'master' into johannes-73
Oct 21, 2020
8d4667c
Add _with_diagnostics methods
Oct 21, 2020
0057f68
Remove `plot_attention_weights`
Oct 21, 2020
e19451c
Fix return without model
Oct 21, 2020
ad1b44c
Remove comment
Oct 21, 2020
f55aed2
Apply BLACK formatting
Oct 21, 2020
f6decb4
Fix formatting
Oct 21, 2020
fbd93fe
Remove plot references
Oct 21, 2020
f47d96c
Merge branch 'master' into johannes-73
Oct 26, 2020
c7ee1cf
Merge branch 'master' into johannes-73
Oct 26, 2020
88a1d34
Move constants out of shared
Oct 26, 2020
6588ac5
Merge branch 'master' into johannes-73
Oct 27, 2020
1e78dbc
Merge branch 'master' into johannes-73
Oct 29, 2020
2501dea
Fix formatting
Oct 29, 2020
9e2beea
Fix _prepare_transformer_layer
Oct 29, 2020
b16a55e
Merge branch 'master' into johannes-73
Nov 3, 2020
bf7f08a
Return diagnostics with `process`
Nov 3, 2020
d30123c
Convert diagnostics to numpy
Nov 3, 2020
5044721
Add test for DIETClassifier
Nov 3, 2020
66120bc
Add tests and fix TEDPolicy
Nov 3, 2020
e1ed713
Merge branch 'master' into johannes-73
Nov 16, 2020
06319dd
Apply BLACK formatting
Nov 16, 2020
08b9f71
Add doc-string
Nov 16, 2020
9c393ee
Fix tests
Nov 16, 2020
c3c8bde
Add changelog
Nov 16, 2020
2d4776d
Add `text_transformed` diagnostic
Nov 16, 2020
44d33e8
Remove trailing whitespaces
Nov 16, 2020
891fdcd
Code formatting
Nov 17, 2020
21b83de
Merge branch 'master' into johannes-73
Nov 17, 2020
871e765
Don't compare diagnostic data for eq
Nov 17, 2020
09eae2a
Fix equality for `diagnostic_data`
Nov 17, 2020
04e529f
Add a docstring
Nov 17, 2020
18631ac
Merge branch 'master' into johannes-73
Nov 17, 2020
93fc7b8
Merge branch 'master' into johannes-73
Nov 17, 2020
693781f
Fix response selector
Nov 18, 2020
c6be551
Merge branch 'johannes-73' of github.com:RasaHQ/rasa into johannes-73
Nov 18, 2020
c880170
Merge branch 'master' into johannes-73
Nov 18, 2020
9504e83
Remove newline after doc string
Nov 18, 2020
085fec3
Merge branch 'master' into johannes-73
Nov 25, 2020
448b12f
Add diagnostic_data to message
Nov 25, 2020
00c5cf0
Fix test
Nov 25, 2020
759c306
Merge branch 'master' into johannes-73
Nov 25, 2020
12f50da
Handle diagnostic_data from multiple components
Nov 26, 2020
5dd09b7
Add doc string
Nov 26, 2020
4593e3a
Move message stuff to shared
Nov 26, 2020
490280a
Add doc string
Nov 26, 2020
ae9811c
Remove trailing whitespace
Nov 26, 2020
d7f4c85
Add `unique_name` property
Nov 26, 2020
86bd3ec
Fix `test_set_attr_on_component`
Nov 26, 2020
7e1e02d
Apply BLACK formatting
Nov 26, 2020
3ee79e2
Fix formatting
Nov 26, 2020
af7627e
Add newline
Nov 27, 2020
e797e40
Remove whitespace
Nov 27, 2020
5a849eb
Update rasa/nlu/featurizers/featurizer.py
Nov 27, 2020
4225b84
Update rasa/core/policies/policy.py
Nov 27, 2020
c2c59d0
Update rasa/nlu/components.py
Dec 1, 2020
834ef6c
Update rasa/core/policies/policy.py
Dec 1, 2020
a7d3297
Update rasa/nlu/config.py
Dec 1, 2020
0757363
Update rasa/nlu/config.py
Dec 1, 2020
ac38f06
Update rasa/shared/nlu/training_data/message.py
Dec 1, 2020
b5e1104
Move `values_to_numpy` out of `shared`
Dec 1, 2020
74ac0d9
Avoid comparing diagnostic data for prediction
Dec 1, 2020
a3d46e6
Update doc strings and fix import
Dec 1, 2020
109378d
Remove spaces
Dec 1, 2020
01d342c
Update changelog
Dec 1, 2020
1db2d94
Add minimal examples
Dec 1, 2020
cf88309
Remove path
Dec 1, 2020
b9bc467
Remove comment
Dec 1, 2020
b8bb3db
Add blank line
Dec 1, 2020
6ae03fa
Lint
Dec 1, 2020
8c5e607
Add `python` declaration
Dec 1, 2020
611c464
Avoid angle brackets
Dec 1, 2020
87067e4
Merge remote-tracking branch 'github/master' into johannes-73
Dec 1, 2020
77ad137
Fix tests
Dec 1, 2020
ac70a72
Merge branch 'master' into johannes-73
koaning Dec 4, 2020
e0856f6
Merge branch 'master' into johannes-73
koaning Dec 4, 2020
7f43fb9
Update rasa/nlu/classifiers/diet_classifier.py
Dec 10, 2020
bfc67a8
Update rasa/nlu/components.py
Dec 10, 2020
0b7aaa0
Update rasa/nlu/components.py
Dec 10, 2020
c92998b
Update rasa/nlu/components.py
Dec 10, 2020
4116616
Update rasa/nlu/components.py
Dec 10, 2020
3468331
Update rasa/shared/nlu/training_data/message.py
Dec 10, 2020
87f2418
Update rasa/utils/tensorflow/tf_to_numpy.py
Dec 10, 2020
10035b7
Update tests/nlu/classifiers/test_diet_classifier.py
Dec 10, 2020
1246d8b
Update tests/nlu/selectors/test_selectors.py
Dec 10, 2020
9ab4a63
Make minor changes
Dec 10, 2020
e2ef5a9
Merge branch 'master' into johannes-73
Dec 10, 2020
3b9b596
Move TEDPolicy tests to separate module
Dec 10, 2020
28db2e7
Merge branch 'master' into johannes-73
Dec 14, 2020
03f9e79
Remove blanks
Dec 14, 2020
bfc4f7e
Improve some doc strings
Dec 14, 2020
a147359
Merge branch 'master' into johannes-73
Dec 14, 2020
38c52d8
Merge branch 'master' into johannes-73
Dec 15, 2020
e49fec7
Add test and minor improvements
Dec 15, 2020
f729e8e
Merge branch 'master' into johannes-73
Dec 15, 2020
78ee2c7
Update relative paths
Dec 15, 2020
214956a
Update relative paths
Dec 15, 2020
644199c
Update relative paths
Dec 15, 2020
498c6a5
Merge branch 'master' into johannes-73
Dec 18, 2020
9d14542
Merge branch 'master' into johannes-73
Jan 4, 2021
26a43b0
Draft component index fingerprint bug fix
Jan 5, 2021
5fd02ca
Fix component index problem
Jan 6, 2021
65e7fd1
Update relative paths
Jan 6, 2021
a97c97d
fix changelog links
m-vdb Jan 6, 2021
511c9d8
Fix relative paths again
Jan 6, 2021
8853801
Another relative paths fix
Jan 6, 2021
b600da7
Fix typos
Jan 6, 2021
501cf0b
Merge branch 'master' into johannes-73
Jan 11, 2021
9d93f54
Add a test
Jan 11, 2021
d9faf5f
Merge branch 'master' into johannes-73
Jan 12, 2021
3d3014e
Merge branch 'master' into johannes-73
Jan 15, 2021
4a0bd38
Add documentation
Jan 15, 2021
69b564e
Re-use trained model for test
Jan 15, 2021
9c1378f
Move documentation under common headline
Jan 15, 2021
a136ab9
Merge branch 'master' into johannes-73
Jan 20, 2021
3b25e2b
Merge branch 'master' into johannes-73
Jan 20, 2021
bf21598
Merge branch 'master' into johannes-73
Jan 21, 2021
dc4037f
Move example code to ghist
Jan 21, 2021
75155c5
Use fixture for ResponseSelector test
Jan 22, 2021
d023f9f
Merge branch 'main' into johannes-73
Jan 22, 2021
244fae0
Change exception type
Jan 22, 2021
71b4253
Merge branch 'johannes-73' of github.com:RasaHQ/rasa into johannes-73
Jan 22, 2021
b0d3dbc
Import less
Jan 22, 2021
dbc332f
Remove f from f-string
Jan 22, 2021
95fdabc
Merge branch 'main' into johannes-73
Jan 22, 2021
644ec37
Rename `test_process_gives_diagnostic_data`
Jan 22, 2021
b8940c2
Merge branch 'johannes-73' of github.com:RasaHQ/rasa into johannes-73
Jan 22, 2021
8a41088
Merge branch 'main' into johannes-73
Jan 22, 2021
977d662
Merge branch 'johannes-73' of github.com:RasaHQ/rasa into johannes-73
Jan 22, 2021
e57575d
Change test_values_to_numpy
Jan 26, 2021
3316a9b
Merge branch 'main' into johannes-73
Jan 26, 2021
f2fe222
Merge branch 'johannes-73' of github.com:RasaHQ/rasa into johannes-73
Jan 26, 2021
1c83a74
Simplify test_values_to_numpy
Jan 26, 2021
b20a5e0
Merge branch 'main' into johannes-73
Jan 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions changelog/5673.improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Expose diagnostic data for action and NLU predictions.
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved

Add `diagnostic_data` field to the [Message](./reference/rasa/shared/nlu/training_data/message.md#message-objects)
and [Prediction](./reference/rasa/core/policies/policy.md#policyprediction-objects) objects, which contain
information about attention weights and other intermediate results of the inference computation.
This information can be used for debugging and fine-tuning, e.g. with [RasaLit](https://github.com/RasaHQ/rasalit).

For examples of how to access the diagnostic data, see [here](https://gist.github.com/JEM-Mosig/c6e15b81ee70561cb72e361aff310d7e).
37 changes: 36 additions & 1 deletion docs/docs/tuning-your-model.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,9 @@ Here is a summary of the available extractors and what they are best used for:
|`MitieEntityExtractor` |MITIE |structured SVM |good for training custom entities |
|`EntitySynonymMapper` |existing entities |N/A |maps known synonyms |

## Handling Class Imbalance
## Improving Performance

### Handling Class Imbalance

Classification algorithms often do not perform well if there is a large class imbalance,
for example if you have a lot of training data for some intents and very little training data for others.
Expand All @@ -312,6 +314,39 @@ pipeline:
batch_strategy: sequence
```

### Accessing Diagnostic Data

To gain a better understanding of what your models do, you can access intermediate results of the prediction process.
To do this, you need to access the `diagnostic_data` field of the [Message](./reference/rasa/shared/nlu/training_data/message.md#message-objects)
and [Prediction](./reference/rasa/core/policies/policy.md#policyprediction-objects) objects, which contain
information about attention weights and other intermediate results of the inference computation.
You can use this information for debugging and fine-tuning, e.g. with [RasaLit](https://github.com/RasaHQ/rasalit).

After you've [trained a model](.//command-line-interface.mdx#rasa-train), you can access diagnostic data for DIET,
given a processed message, like this:

```python
nlu_diagnostic_data = message.as_dict()[DIAGNOSTIC_DATA]

for component_name, diagnostic_data in nlu_diagnostic_data.items():
attention_weights = diagnostic_data["attention_weights"]
print(f"attention_weights for {component_name}:")
print(attention_weights)

text_transformed = diagnostic_data["text_transformed"]
print(f"\ntext_transformed for {component_name}:")
print(text_transformed)
```

And you can access diagnostic data for TED like this:

```python
prediction = policy.predict_action_probabilities(
GREET_RULE, domain, RegexInterpreter()
)
print(f"{prediction.diagnostic_data.get('attention_weights')}")
```


## Configuring Tensorflow

Expand Down
9 changes: 9 additions & 0 deletions rasa/core/policies/policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ def _prediction(
events: Optional[List[Event]] = None,
optional_events: Optional[List[Event]] = None,
is_end_to_end_prediction: bool = False,
diagnostic_data: Optional[Dict[Text, Any]] = None,
) -> "PolicyPrediction":
return PolicyPrediction(
probabilities,
Expand All @@ -244,6 +245,7 @@ def _prediction(
events,
optional_events,
is_end_to_end_prediction,
diagnostic_data,
)

def _metadata(self) -> Optional[Dict[Text, Any]]:
Expand Down Expand Up @@ -400,6 +402,7 @@ def __init__(
events: Optional[List[Event]] = None,
optional_events: Optional[List[Event]] = None,
is_end_to_end_prediction: bool = False,
diagnostic_data: Optional[Dict[Text, Any]] = None,
) -> None:
"""Creates a `PolicyPrediction`.

Expand All @@ -417,13 +420,17 @@ def __init__(
you return as they can potentially influence the conversation flow.
is_end_to_end_prediction: `True` if the prediction used the text of the
user message instead of the intent.
diagnostic_data: Intermediate results or other information that is not
necessary for Rasa to function, but intended for debugging and
fine-tuning purposes.
"""
self.probabilities = probabilities
self.policy_name = policy_name
self.policy_priority = (policy_priority,)
self.events = events or []
self.optional_events = optional_events or []
self.is_end_to_end_prediction = is_end_to_end_prediction
self.diagnostic_data = diagnostic_data or {}

@staticmethod
def for_action_name(
Expand Down Expand Up @@ -466,6 +473,8 @@ def __eq__(self, other: Any) -> bool:
and self.events == other.events
and self.optional_events == other.events
and self.is_end_to_end_prediction == other.is_end_to_end_prediction
# We do not compare `diagnostic_data`, because it has no effect on the
# action prediction.
)

@property
Expand Down
34 changes: 27 additions & 7 deletions rasa/core/policies/ted_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
from rasa.shared.nlu.interpreter import NaturalLanguageInterpreter
from rasa.core.policies.policy import Policy, PolicyPrediction
from rasa.core.constants import DEFAULT_POLICY_PRIORITY, DIALOGUE
from rasa.shared.constants import DIAGNOSTIC_DATA
from rasa.shared.core.constants import ACTIVE_LOOP, SLOTS, ACTION_LISTEN_NAME
from rasa.shared.core.trackers import DialogueStateTracker
from rasa.shared.core.generator import TrackerWithCachedStates
Expand All @@ -50,6 +51,7 @@
Data,
)
from rasa.utils.tensorflow.model_data_utils import convert_to_data_format
import rasa.utils.tensorflow.numpy
from rasa.utils.tensorflow.constants import (
LABEL,
IDS,
Expand Down Expand Up @@ -632,6 +634,9 @@ def predict_action_probabilities(
confidence.tolist(),
is_end_to_end_prediction=is_e2e_prediction,
optional_events=optional_events,
diagnostic_data=rasa.utils.tensorflow.numpy.values_to_numpy(
output.get(DIAGNOSTIC_DATA)
),
)

def _create_optional_event_for_entities(
Expand Down Expand Up @@ -1050,14 +1055,23 @@ def _embed_dialogue(
self,
dialogue_in: tf.Tensor,
tf_batch_data: Dict[Text, Dict[Text, List[tf.Tensor]]],
) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]:
"""Create dialogue level embedding and mask."""
) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, Optional[tf.Tensor]]:
"""Creates dialogue level embedding and mask.

Args:
dialogue_in: The encoded dialogue.
tf_batch_data: Batch in model data format.

Returns:
The dialogue embedding, the mask, and (for diagnostic purposes)
also the attention weights.
"""
dialogue_lengths = tf.cast(tf_batch_data[DIALOGUE][LENGTH][0], tf.int32)
mask = self._compute_mask(dialogue_lengths)

dialogue_transformed = self._tf_layers[f"transformer.{DIALOGUE}"](
dialogue_in, 1 - mask, self._training
)
dialogue_transformed, attention_weights = self._tf_layers[
f"transformer.{DIALOGUE}"
](dialogue_in, 1 - mask, self._training)
dialogue_transformed = tfa.activations.gelu(dialogue_transformed)

if self.use_only_last_dialogue_turns:
Expand All @@ -1069,7 +1083,7 @@ def _embed_dialogue(

dialogue_embed = self._tf_layers[f"embed.{DIALOGUE}"](dialogue_transformed)

return dialogue_embed, mask, dialogue_transformed
return dialogue_embed, mask, dialogue_transformed, attention_weights

def _encode_features_per_attribute(
self, tf_batch_data: Dict[Text, Dict[Text, List[tf.Tensor]]], attribute: Text
Expand Down Expand Up @@ -1615,6 +1629,7 @@ def batch_loss(
dialogue_embed,
dialogue_mask,
dialogue_transformer_output,
_,
) = self._embed_dialogue(dialogue_in, tf_batch_data)
dialogue_mask = tf.squeeze(dialogue_mask, axis=-1)

Expand Down Expand Up @@ -1686,6 +1701,7 @@ def batch_predict(
dialogue_embed,
dialogue_mask,
dialogue_transformer_output,
attention_weights,
) = self._embed_dialogue(dialogue_in, tf_batch_data)
dialogue_mask = tf.squeeze(dialogue_mask, axis=-1)

Expand All @@ -1698,7 +1714,11 @@ def batch_predict(
scores = self._tf_layers[f"loss.{LABEL}"].confidence_from_sim(
sim_all, self.config[SIMILARITY_TYPE]
)
predictions = {"action_scores": scores, "similarities": sim_all}
predictions = {
"action_scores": scores,
"similarities": sim_all,
DIAGNOSTIC_DATA: {"attention_weights": attention_weights},
}

if (
self.config[ENTITY_RECOGNITION]
Expand Down
19 changes: 16 additions & 3 deletions rasa/nlu/classifiers/diet_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
import rasa.shared.utils.io
import rasa.utils.io as io_utils
import rasa.nlu.utils.bilou_utils as bilou_utils
import rasa.utils.tensorflow.numpy
from rasa.shared.constants import DIAGNOSTIC_DATA
from rasa.nlu.featurizers.featurizer import Featurizer
from rasa.nlu.components import Component
from rasa.nlu.classifiers.classifier import IntentClassifier
Expand Down Expand Up @@ -914,7 +916,7 @@ def _predict_entities(
return entities

def process(self, message: Message, **kwargs: Any) -> None:
"""Return the most likely label and its similarity to the input."""
"""Augments the message with intents, entities, and diagnostic data."""
out = self._predict(message)

if self.component_config[INTENT_CLASSIFICATION]:
Expand All @@ -928,12 +930,17 @@ def process(self, message: Message, **kwargs: Any) -> None:

message.set(ENTITIES, entities, add_to_output=True)

if out and DIAGNOSTIC_DATA in out:
message.add_diagnostic_data(
self.unique_name,
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
rasa.utils.tensorflow.numpy.values_to_numpy(out.get(DIAGNOSTIC_DATA)),
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
)

def persist(self, file_name: Text, model_dir: Text) -> Dict[Text, Any]:
"""Persist this model into the passed directory.

Return the metadata necessary to load the model again.
"""

if self.model is None:
return {"file": None}

Expand Down Expand Up @@ -1420,6 +1427,7 @@ def batch_loss(
text_in,
text_seq_ids,
lm_mask_bool_text,
_,
) = self._create_sequence(
tf_batch_data[TEXT][SEQUENCE],
tf_batch_data[TEXT][SENTENCE],
Expand Down Expand Up @@ -1569,7 +1577,7 @@ def batch_predict(

mask = self._compute_mask(sequence_lengths)

text_transformed, _, _, _ = self._create_sequence(
text_transformed, _, _, _, attention_weights = self._create_sequence(
tf_batch_data[TEXT][SEQUENCE],
tf_batch_data[TEXT][SENTENCE],
mask_sequence_text,
Expand All @@ -1579,6 +1587,11 @@ def batch_predict(

predictions: Dict[Text, tf.Tensor] = {}

predictions[DIAGNOSTIC_DATA] = {
"attention_weights": attention_weights,
"text_transformed": text_transformed,
}

if self.config[INTENT_CLASSIFICATION]:
predictions.update(
self._batch_predict_intents(sequence_lengths, text_transformed)
Expand Down
Loading