Skip to content

Commit

Permalink
Merge pull request #9572 from RasaHQ/issue-8930
Browse files Browse the repository at this point in the history
Dense features and CRFEntityExtractor - docs update
  • Loading branch information
tttthomasssss authored Sep 17, 2021
2 parents af04d86 + 04bc60a commit 954107f
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 21 deletions.
1 change: 1 addition & 0 deletions changelog/8930.doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Adds documentation on how to use `CRFEntityExtractor` with features from a dense featurizer (e.g. `LanguageModelFeaturizer`).
50 changes: 29 additions & 21 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1710,7 +1710,8 @@ The `SpacyEntityExtractor` extractor does not provide a `confidence` level and w
If you want to pass custom features, such as pre-trained word embeddings, to `CRFEntityExtractor`, you can
add any dense featurizer to the pipeline before the `CRFEntityExtractor`.
add any dense featurizer to the pipeline before the `CRFEntityExtractor` and subsequently configure
`CRFEntityExtractor` to make use of the dense features by adding `"text_dense_feature"` to its feature configuration.
`CRFEntityExtractor` automatically finds the additional dense features and checks if the dense features are an
iterable of `len(tokens)`, where each entry is a vector.
A warning will be shown in case the check fails.
Expand All @@ -1727,26 +1728,27 @@ The `SpacyEntityExtractor` extractor does not provide a `confidence` level and w
The following features are available:
```
============== ==========================================================================================
Feature Name Description
============== ==========================================================================================
low Checks if the token is lower case.
upper Checks if the token is upper case.
title Checks if the token starts with an uppercase character and all remaining characters are
lowercased.
digit Checks if the token contains just digits.
prefix5 Take the first five characters of the token.
prefix2 Take the first two characters of the token.
suffix5 Take the last five characters of the token.
suffix3 Take the last three characters of the token.
suffix2 Take the last two characters of the token.
suffix1 Take the last character of the token.
pos Take the Part-of-Speech tag of the token (``SpacyTokenizer`` required).
pos2 Take the first two characters of the Part-of-Speech tag of the token
(``SpacyTokenizer`` required).
pattern Take the patterns defined by ``RegexFeaturizer``.
bias Add an additional "bias" feature to the list of features.
============== ==========================================================================================
=================== ==========================================================================================
Feature Name Description
=================== ==========================================================================================
low Checks if the token is lower case.
upper Checks if the token is upper case.
title Checks if the token starts with an uppercase character and all remaining characters are
lowercased.
digit Checks if the token contains just digits.
prefix5 Take the first five characters of the token.
prefix2 Take the first two characters of the token.
suffix5 Take the last five characters of the token.
suffix3 Take the last three characters of the token.
suffix2 Take the last two characters of the token.
suffix1 Take the last character of the token.
pos Take the Part-of-Speech tag of the token (``SpacyTokenizer`` required).
pos2 Take the first two characters of the Part-of-Speech tag of the token
(``SpacyTokenizer`` required).
pattern Take the patterns defined by ``RegexFeaturizer``.
bias Add an additional "bias" feature to the list of features.
text_dense_features Adds additional features from a dense featurizer.
=================== ==========================================================================================
```
As the featurizer is moving over the tokens in a user message with a sliding window, you can define features for
Expand Down Expand Up @@ -1779,6 +1781,7 @@ The `SpacyEntityExtractor` extractor does not provide a `confidence` level and w
"pattern",
],
["low", "title", "upper"],
["text_dense_features"]
]
# The maximum number of iterations for optimization algorithms.
"max_iterations": 50
Expand All @@ -1805,6 +1808,11 @@ The `SpacyEntityExtractor` extractor does not provide a `confidence` level and w
:::
:::note
If `text_dense_features` features are used, you need to have a dense featurizer (e.g. `LanguageModelFeaturizer`) in
your pipeline.
:::
### DucklingEntityExtractor
Expand Down

0 comments on commit 954107f

Please sign in to comment.