|
4 | 4 | <titleabbrev>Predicate script</titleabbrev> |
5 | 5 | ++++ |
6 | 6 |
|
7 | | -The predicate_token_filter token filter takes a predicate script, and removes tokens that do |
8 | | -not match the predicate. |
| 7 | +Removes tokens that don't match a provided predicate script. The filter supports |
| 8 | +inline {painless}/index.html[Painless] scripts only. Scripts are evaluated in |
| 9 | +the {painless}/painless-analysis-predicate-context.html[analysis predicate |
| 10 | +context]. |
9 | 11 |
|
10 | | -[float] |
11 | | -=== Options |
12 | | -[horizontal] |
13 | | -script:: a predicate script that determines whether or not the current token will |
14 | | -be emitted. Note that only inline scripts are supported. |
| 12 | +[[analysis-predicatefilter-tokenfilter-analyze-ex]] |
| 13 | +==== Example |
15 | 14 |
|
16 | | -[float] |
17 | | -=== Settings example |
18 | | - |
19 | | -You can set it up like: |
| 15 | +The following <<indices-analyze,analyze API>> request uses the |
| 16 | +`predicate_token_filter` filter to only output tokens longer than three |
| 17 | +characters from `the fox jumps the lazy dog`. |
20 | 18 |
|
21 | 19 | [source,console] |
22 | | --------------------------------------------------- |
23 | | -PUT /condition_example |
| 20 | +---- |
| 21 | +GET /_analyze |
24 | 22 | { |
25 | | - "settings" : { |
26 | | - "analysis" : { |
27 | | - "analyzer" : { |
28 | | - "my_analyzer" : { |
29 | | - "tokenizer" : "standard", |
30 | | - "filter" : [ "my_script_filter" ] |
31 | | - } |
32 | | - }, |
33 | | - "filter" : { |
34 | | - "my_script_filter" : { |
35 | | - "type" : "predicate_token_filter", |
36 | | - "script" : { |
37 | | - "source" : "token.getTerm().length() > 5" <1> |
38 | | - } |
39 | | - } |
40 | | - } |
41 | | - } |
| 23 | + "tokenizer": "whitespace", |
| 24 | + "filter": [ |
| 25 | + { |
| 26 | + "type": "predicate_token_filter", |
| 27 | + "script": { |
| 28 | + "source": """ |
| 29 | + token.term.length() > 3 |
| 30 | + """ |
| 31 | + } |
42 | 32 | } |
| 33 | + ], |
| 34 | + "text": "the fox jumps the lazy dog" |
43 | 35 | } |
44 | | --------------------------------------------------- |
| 36 | +---- |
45 | 37 |
|
46 | | -<1> This will emit tokens that are more than 5 characters long |
| 38 | +The filter produces the following tokens. |
47 | 39 |
|
48 | | -And test it like: |
49 | | - |
50 | | -[source,console] |
51 | | --------------------------------------------------- |
52 | | -POST /condition_example/_analyze |
53 | | -{ |
54 | | - "analyzer" : "my_analyzer", |
55 | | - "text" : "What Flapdoodle" |
56 | | -} |
57 | | --------------------------------------------------- |
58 | | -// TEST[continued] |
| 40 | +[source,text] |
| 41 | +---- |
| 42 | +[ jumps, lazy ] |
| 43 | +---- |
59 | 44 |
|
60 | | -And it'd respond: |
| 45 | +The API response contains the position and offsets of each output token. Note |
| 46 | +the `predicate_token_filter` filter does not change the tokens' original |
| 47 | +positions or offets. |
61 | 48 |
|
| 49 | +.*Response* |
| 50 | +[%collapsible] |
| 51 | +==== |
62 | 52 | [source,console-result] |
63 | | --------------------------------------------------- |
| 53 | +---- |
64 | 54 | { |
65 | | - "tokens": [ |
| 55 | + "tokens" : [ |
| 56 | + { |
| 57 | + "token" : "jumps", |
| 58 | + "start_offset" : 8, |
| 59 | + "end_offset" : 13, |
| 60 | + "type" : "word", |
| 61 | + "position" : 2 |
| 62 | + }, |
66 | 63 | { |
67 | | - "token": "Flapdoodle", <1> |
68 | | - "start_offset": 5, |
69 | | - "end_offset": 15, |
70 | | - "type": "<ALPHANUM>", |
71 | | - "position": 1 <2> |
| 64 | + "token" : "lazy", |
| 65 | + "start_offset" : 18, |
| 66 | + "end_offset" : 22, |
| 67 | + "type" : "word", |
| 68 | + "position" : 4 |
72 | 69 | } |
73 | 70 | ] |
74 | 71 | } |
75 | | --------------------------------------------------- |
| 72 | +---- |
| 73 | +==== |
| 74 | + |
| 75 | +[[analysis-predicatefilter-tokenfilter-configure-parms]] |
| 76 | +==== Configurable parameters |
| 77 | + |
| 78 | +`script`:: |
| 79 | +(Required, <<modules-scripting-using,script object>>) |
| 80 | +Script containing a condition used to filter incoming tokens. Only tokens that |
| 81 | +match this script are included in the output. |
| 82 | ++ |
| 83 | +This parameter supports inline {painless}/index.html[Painless] scripts only. The |
| 84 | +script is evaluated in the |
| 85 | +{painless}/painless-analysis-predicate-context.html[analysis predicate context]. |
76 | 86 |
|
77 | | -<1> The token 'What' has been removed from the tokenstream because it does not |
78 | | -match the predicate. |
79 | | -<2> The position and offset values are unaffected by the removal of earlier tokens |
| 87 | +[[analysis-predicatefilter-tokenfilter-customize]] |
| 88 | +==== Customize and add to an analyzer |
| 89 | + |
| 90 | +To customize the `predicate_token_filter` filter, duplicate it to create the basis |
| 91 | +for a new custom token filter. You can modify the filter using its configurable |
| 92 | +parameters. |
| 93 | + |
| 94 | +The following <<indices-create-index,create index API>> request |
| 95 | +configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom |
| 96 | +`predicate_token_filter` filter, `my_script_filter`. |
| 97 | + |
| 98 | +The `my_script_filter` filter removes tokens with of any type other than |
| 99 | +`ALPHANUM`. |
| 100 | + |
| 101 | +[source,console] |
| 102 | +---- |
| 103 | +PUT /my_index |
| 104 | +{ |
| 105 | + "settings": { |
| 106 | + "analysis": { |
| 107 | + "analyzer": { |
| 108 | + "my_analyzer": { |
| 109 | + "tokenizer": "standard", |
| 110 | + "filter": [ |
| 111 | + "my_script_filter" |
| 112 | + ] |
| 113 | + } |
| 114 | + }, |
| 115 | + "filter": { |
| 116 | + "my_script_filter": { |
| 117 | + "type": "predicate_token_filter", |
| 118 | + "script": { |
| 119 | + "source": """ |
| 120 | + token.type.contains("ALPHANUM") |
| 121 | + """ |
| 122 | + } |
| 123 | + } |
| 124 | + } |
| 125 | + } |
| 126 | + } |
| 127 | +} |
| 128 | +---- |
0 commit comments