Skip to content

Commit 8b6e310

Browse files
authored
[DOCS] Reformat predicate_token_filter tokenfilter (#57705)
1 parent 448bcba commit 8b6e310

File tree

1 file changed

+104
-55
lines changed

1 file changed

+104
-55
lines changed

docs/reference/analysis/tokenfilters/predicate-tokenfilter.asciidoc

Lines changed: 104 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -4,76 +4,125 @@
44
<titleabbrev>Predicate script</titleabbrev>
55
++++
66

7-
The predicate_token_filter token filter takes a predicate script, and removes tokens that do
8-
not match the predicate.
7+
Removes tokens that don't match a provided predicate script. The filter supports
8+
inline {painless}/index.html[Painless] scripts only. Scripts are evaluated in
9+
the {painless}/painless-analysis-predicate-context.html[analysis predicate
10+
context].
911

10-
[float]
11-
=== Options
12-
[horizontal]
13-
script:: a predicate script that determines whether or not the current token will
14-
be emitted. Note that only inline scripts are supported.
12+
[[analysis-predicatefilter-tokenfilter-analyze-ex]]
13+
==== Example
1514

16-
[float]
17-
=== Settings example
18-
19-
You can set it up like:
15+
The following <<indices-analyze,analyze API>> request uses the
16+
`predicate_token_filter` filter to only output tokens longer than three
17+
characters from `the fox jumps the lazy dog`.
2018

2119
[source,console]
22-
--------------------------------------------------
23-
PUT /condition_example
20+
----
21+
GET /_analyze
2422
{
25-
"settings" : {
26-
"analysis" : {
27-
"analyzer" : {
28-
"my_analyzer" : {
29-
"tokenizer" : "standard",
30-
"filter" : [ "my_script_filter" ]
31-
}
32-
},
33-
"filter" : {
34-
"my_script_filter" : {
35-
"type" : "predicate_token_filter",
36-
"script" : {
37-
"source" : "token.getTerm().length() > 5" <1>
38-
}
39-
}
40-
}
41-
}
23+
"tokenizer": "whitespace",
24+
"filter": [
25+
{
26+
"type": "predicate_token_filter",
27+
"script": {
28+
"source": """
29+
token.term.length() > 3
30+
"""
31+
}
4232
}
33+
],
34+
"text": "the fox jumps the lazy dog"
4335
}
44-
--------------------------------------------------
36+
----
4537

46-
<1> This will emit tokens that are more than 5 characters long
38+
The filter produces the following tokens.
4739

48-
And test it like:
49-
50-
[source,console]
51-
--------------------------------------------------
52-
POST /condition_example/_analyze
53-
{
54-
"analyzer" : "my_analyzer",
55-
"text" : "What Flapdoodle"
56-
}
57-
--------------------------------------------------
58-
// TEST[continued]
40+
[source,text]
41+
----
42+
[ jumps, lazy ]
43+
----
5944

60-
And it'd respond:
45+
The API response contains the position and offsets of each output token. Note
46+
the `predicate_token_filter` filter does not change the tokens' original
47+
positions or offets.
6148

49+
.*Response*
50+
[%collapsible]
51+
====
6252
[source,console-result]
63-
--------------------------------------------------
53+
----
6454
{
65-
"tokens": [
55+
"tokens" : [
56+
{
57+
"token" : "jumps",
58+
"start_offset" : 8,
59+
"end_offset" : 13,
60+
"type" : "word",
61+
"position" : 2
62+
},
6663
{
67-
"token": "Flapdoodle", <1>
68-
"start_offset": 5,
69-
"end_offset": 15,
70-
"type": "<ALPHANUM>",
71-
"position": 1 <2>
64+
"token" : "lazy",
65+
"start_offset" : 18,
66+
"end_offset" : 22,
67+
"type" : "word",
68+
"position" : 4
7269
}
7370
]
7471
}
75-
--------------------------------------------------
72+
----
73+
====
74+
75+
[[analysis-predicatefilter-tokenfilter-configure-parms]]
76+
==== Configurable parameters
77+
78+
`script`::
79+
(Required, <<modules-scripting-using,script object>>)
80+
Script containing a condition used to filter incoming tokens. Only tokens that
81+
match this script are included in the output.
82+
+
83+
This parameter supports inline {painless}/index.html[Painless] scripts only. The
84+
script is evaluated in the
85+
{painless}/painless-analysis-predicate-context.html[analysis predicate context].
7686

77-
<1> The token 'What' has been removed from the tokenstream because it does not
78-
match the predicate.
79-
<2> The position and offset values are unaffected by the removal of earlier tokens
87+
[[analysis-predicatefilter-tokenfilter-customize]]
88+
==== Customize and add to an analyzer
89+
90+
To customize the `predicate_token_filter` filter, duplicate it to create the basis
91+
for a new custom token filter. You can modify the filter using its configurable
92+
parameters.
93+
94+
The following <<indices-create-index,create index API>> request
95+
configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
96+
`predicate_token_filter` filter, `my_script_filter`.
97+
98+
The `my_script_filter` filter removes tokens with of any type other than
99+
`ALPHANUM`.
100+
101+
[source,console]
102+
----
103+
PUT /my_index
104+
{
105+
"settings": {
106+
"analysis": {
107+
"analyzer": {
108+
"my_analyzer": {
109+
"tokenizer": "standard",
110+
"filter": [
111+
"my_script_filter"
112+
]
113+
}
114+
},
115+
"filter": {
116+
"my_script_filter": {
117+
"type": "predicate_token_filter",
118+
"script": {
119+
"source": """
120+
token.type.contains("ALPHANUM")
121+
"""
122+
}
123+
}
124+
}
125+
}
126+
}
127+
}
128+
----

0 commit comments

Comments
 (0)