11[[analysis-asciifolding-tokenfilter]]
2- === ASCII Folding Token Filter
2+ === ASCII folding token filter
3+ ++++
4+ <titleabbrev>ASCII folding</titleabbrev>
5+ ++++
36
4- A token filter of type `asciifolding` that converts alphabetic, numeric,
5- and symbolic Unicode characters which are not in the first 127 ASCII
6- characters (the "Basic Latin" Unicode block) into their ASCII
7- equivalents, if one exists. Example:
7+ Converts alphabetic, numeric, and symbolic characters that are not in the Basic
8+ Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
9+ one exists. For example, the filter changes `à` to `a`.
10+
11+ This filter uses Lucene's
12+ https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
13+
14+ [[analysis-asciifolding-tokenfilter-analyze-ex]]
15+ ==== Example
16+
17+ The following <<indices-analyze,analyze API>> request uses the `asciifolding`
18+ filter to drop the diacritical marks in `açaí à la carte`:
19+
20+ [source,console]
21+ --------------------------------------------------
22+ GET /_analyze
23+ {
24+ "tokenizer" : "standard",
25+ "filter" : ["asciifolding"],
26+ "text" : "açaí à la carte"
27+ }
28+ --------------------------------------------------
29+
30+ The filter produces the following tokens:
31+
32+ [source,text]
33+ --------------------------------------------------
34+ [ acai, a, la, carte ]
35+ --------------------------------------------------
36+
37+ /////////////////////
38+ [source,console-result]
39+ --------------------------------------------------
40+ {
41+ "tokens" : [
42+ {
43+ "token" : "acai",
44+ "start_offset" : 0,
45+ "end_offset" : 4,
46+ "type" : "<ALPHANUM>",
47+ "position" : 0
48+ },
49+ {
50+ "token" : "a",
51+ "start_offset" : 5,
52+ "end_offset" : 6,
53+ "type" : "<ALPHANUM>",
54+ "position" : 1
55+ },
56+ {
57+ "token" : "la",
58+ "start_offset" : 7,
59+ "end_offset" : 9,
60+ "type" : "<ALPHANUM>",
61+ "position" : 2
62+ },
63+ {
64+ "token" : "carte",
65+ "start_offset" : 10,
66+ "end_offset" : 15,
67+ "type" : "<ALPHANUM>",
68+ "position" : 3
69+ }
70+ ]
71+ }
72+ --------------------------------------------------
73+ /////////////////////
74+
75+ [[analysis-asciifolding-tokenfilter-analyzer-ex]]
76+ ==== Add to an analyzer
77+
78+ The following <<indices-create-index,create index API>> request uses the
79+ `asciifolding` filter to configure a new
80+ <<analysis-custom-analyzer,custom analyzer>>.
881
982[source,console]
1083--------------------------------------------------
@@ -13,7 +86,7 @@ PUT /asciifold_example
1386 "settings" : {
1487 "analysis" : {
1588 "analyzer" : {
16- "default " : {
89+ "standard_asciifolding " : {
1790 "tokenizer" : "standard",
1891 "filter" : ["asciifolding"]
1992 }
@@ -23,9 +96,23 @@ PUT /asciifold_example
2396}
2497--------------------------------------------------
2598
26- Accepts `preserve_original` setting which defaults to false but if true
27- will keep the original token as well as emit the folded token. For
28- example:
99+ [[analysis-asciifolding-tokenfilter-configure-parms]]
100+ ==== Configurable parameters
101+
102+ `preserve_original`::
103+ (Optional, boolean)
104+ If `true`, emit both original tokens and folded tokens.
105+ Defaults to `false`.
106+
107+ [[analysis-asciifolding-tokenfilter-customize]]
108+ ==== Customize
109+
110+ To customize the `asciifolding` filter, duplicate it to create the basis
111+ for a new custom token filter. You can modify the filter using its configurable
112+ parameters.
113+
114+ For example, the following request creates a custom `asciifolding` filter with
115+ `preserve_original` set to true:
29116
30117[source,console]
31118--------------------------------------------------
@@ -34,7 +121,7 @@ PUT /asciifold_example
34121 "settings" : {
35122 "analysis" : {
36123 "analyzer" : {
37- "default " : {
124+ "standard_asciifolding " : {
38125 "tokenizer" : "standard",
39126 "filter" : ["my_ascii_folding"]
40127 }
0 commit comments