Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 104 additions & 55 deletions docs/reference/analysis/tokenfilters/predicate-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,76 +4,125 @@
<titleabbrev>Predicate script</titleabbrev>
++++

The predicate_token_filter token filter takes a predicate script, and removes tokens that do
not match the predicate.
Removes tokens that don't match a provided predicate script. The filter supports
inline {painless}/index.html[Painless] scripts only. Scripts are evaluated in
the {painless}/painless-analysis-predicate-context.html[analysis predicate
context].

[float]
=== Options
[horizontal]
script:: a predicate script that determines whether or not the current token will
be emitted. Note that only inline scripts are supported.
[[analysis-predicatefilter-tokenfilter-analyze-ex]]
==== Example

[float]
=== Settings example

You can set it up like:
The following <<indices-analyze,analyze API>> request uses the
`predicate_token_filter` filter to only output tokens longer than three
characters from `the fox jumps the lazy dog`.

[source,console]
--------------------------------------------------
PUT /condition_example
----
GET /_analyze
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : [ "my_script_filter" ]
}
},
"filter" : {
"my_script_filter" : {
"type" : "predicate_token_filter",
"script" : {
"source" : "token.getTerm().length() > 5" <1>
}
}
}
}
"tokenizer": "whitespace",
"filter": [
{
"type": "predicate_token_filter",
"script": {
"source": """
token.term.length() > 3
"""
}
}
],
"text": "the fox jumps the lazy dog"
}
--------------------------------------------------
----

<1> This will emit tokens that are more than 5 characters long
The filter produces the following tokens.

And test it like:

[source,console]
--------------------------------------------------
POST /condition_example/_analyze
{
"analyzer" : "my_analyzer",
"text" : "What Flapdoodle"
}
--------------------------------------------------
// TEST[continued]
[source,text]
----
[ jumps, lazy ]
----

And it'd respond:
The API response contains the position and offsets of each output token. Note
the `predicate_token_filter` filter does not change the tokens' original
positions or offets.

.*Response*
[%collapsible]
====
[source,console-result]
--------------------------------------------------
----
{
"tokens": [
"tokens" : [
{
"token" : "jumps",
"start_offset" : 8,
"end_offset" : 13,
"type" : "word",
"position" : 2
},
{
"token": "Flapdoodle", <1>
"start_offset": 5,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1 <2>
"token" : "lazy",
"start_offset" : 18,
"end_offset" : 22,
"type" : "word",
"position" : 4
}
]
}
--------------------------------------------------
----
====

[[analysis-predicatefilter-tokenfilter-configure-parms]]
==== Configurable parameters

`script`::
(Required, <<modules-scripting-using,script object>>)
Script containing a condition used to filter incoming tokens. Only tokens that
match this script are included in the output.
+
This parameter supports inline {painless}/index.html[Painless] scripts only. The
script is evaluated in the
{painless}/painless-analysis-predicate-context.html[analysis predicate context].

<1> The token 'What' has been removed from the tokenstream because it does not
match the predicate.
<2> The position and offset values are unaffected by the removal of earlier tokens
[[analysis-predicatefilter-tokenfilter-customize]]
==== Customize and add to an analyzer

To customize the `predicate_token_filter` filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.

The following <<indices-create-index,create index API>> request
configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
`predicate_token_filter` filter, `my_script_filter`.

The `my_script_filter` filter removes tokens with of any type other than
`ALPHANUM`.

[source,console]
----
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_script_filter"
]
}
},
"filter": {
"my_script_filter": {
"type": "predicate_token_filter",
"script": {
"source": """
token.type.contains("ALPHANUM")
"""
}
}
}
}
}
}
----