11[[analysis-asciifolding-tokenfilter]]
2- === ASCII Folding Token Filter
2+ === ASCII folding token filter
3+ ++++
4+ <titleabbrev>ASCII folding</titleabbrev>
5+ ++++
36
4- A token filter of type `asciifolding` that converts alphabetic, numeric,
5- and symbolic Unicode characters which are not in the first 127 ASCII
6- characters (the "Basic Latin" Unicode block) into their ASCII
7- equivalents, if one exists. Example:
7+ Converts alphabetic, numeric, and symbolic characters that are not in the Basic
8+ Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
9+ one exists. For example, the filter changes `à` to `a`.
10+
11+ This filter uses Lucene's
12+ https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
13+
14+ [[analysis-asciifolding-tokenfilter-analyze-ex]]
15+ ==== Example
16+
17+ The following <<indices-analyze,analyze API>> request uses the `asciifolding`
18+ filter to drop the diacritical marks in `açaí à la carte`:
19+
20+ [source,console]
21+ --------------------------------------------------
22+ GET /_analyze
23+ {
24+ "tokenizer" : "standard",
25+ "filter" : ["asciifolding"],
26+ "text" : "açaí à la carte"
27+ }
28+ --------------------------------------------------
29+
30+ The filter produces the following tokens:
31+
32+ [source,text]
33+ --------------------------------------------------
34+ [ acai, a, la, carte ]
35+ --------------------------------------------------
36+
37+ /////////////////////
38+ [source,console-result]
39+ --------------------------------------------------
40+ {
41+ "tokens" : [
42+ {
43+ "token" : "acai",
44+ "start_offset" : 0,
45+ "end_offset" : 4,
46+ "type" : "<ALPHANUM>",
47+ "position" : 0
48+ },
49+ {
50+ "token" : "a",
51+ "start_offset" : 5,
52+ "end_offset" : 6,
53+ "type" : "<ALPHANUM>",
54+ "position" : 1
55+ },
56+ {
57+ "token" : "la",
58+ "start_offset" : 7,
59+ "end_offset" : 9,
60+ "type" : "<ALPHANUM>",
61+ "position" : 2
62+ },
63+ {
64+ "token" : "carte",
65+ "start_offset" : 10,
66+ "end_offset" : 15,
67+ "type" : "<ALPHANUM>",
68+ "position" : 3
69+ }
70+ ]
71+ }
72+ --------------------------------------------------
73+ /////////////////////
74+
75+ [[analysis-asciifolding-tokenfilter-analyzer-ex]]
76+ ==== Add to an analyzer
77+
78+ The following <<indices-create-index,create index API>> request uses the
79+ `asciifolding` filter to configure a new
80+ <<analysis-custom-analyzer,custom analyzer>>.
881
982[source,js]
1083--------------------------------------------------
@@ -13,7 +86,7 @@ PUT /asciifold_example
1386 "settings" : {
1487 "analysis" : {
1588 "analyzer" : {
16- "default " : {
89+ "standard_asciifolding " : {
1790 "tokenizer" : "standard",
1891 "filter" : ["asciifolding"]
1992 }
@@ -24,9 +97,23 @@ PUT /asciifold_example
2497--------------------------------------------------
2598// CONSOLE
2699
27- Accepts `preserve_original` setting which defaults to false but if true
28- will keep the original token as well as emit the folded token. For
29- example:
100+ [[analysis-asciifolding-tokenfilter-configure-parms]]
101+ ==== Configurable parameters
102+
103+ `preserve_original`::
104+ (Optional, boolean)
105+ If `true`, emit both original tokens and folded tokens.
106+ Defaults to `false`.
107+
108+ [[analysis-asciifolding-tokenfilter-customize]]
109+ ==== Customize
110+
111+ To customize the `asciifolding` filter, duplicate it to create the basis
112+ for a new custom token filter. You can modify the filter using its configurable
113+ parameters.
114+
115+ For example, the following request creates a custom `asciifolding` filter with
116+ `preserve_original` set to true:
30117
31118[source,js]
32119--------------------------------------------------
@@ -35,7 +122,7 @@ PUT /asciifold_example
35122 "settings" : {
36123 "analysis" : {
37124 "analyzer" : {
38- "default " : {
125+ "standard_asciifolding " : {
39126 "tokenizer" : "standard",
40127 "filter" : ["my_ascii_folding"]
41128 }
0 commit comments