Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Strips all characters after an apostrophe, including the apostrophe itself.

This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
analyzer>>. It uses Lucene's
https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
which was built for the Turkish language.


Expand Down
104 changes: 94 additions & 10 deletions docs/reference/analysis/tokenfilters/asciifolding-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,83 @@
[[analysis-asciifolding-tokenfilter]]
=== ASCII Folding Token Filter
=== ASCII folding token filter
++++
<titleabbrev>ASCII folding</titleabbrev>
++++

A token filter of type `asciifolding` that converts alphabetic, numeric,
and symbolic Unicode characters which are not in the first 127 ASCII
characters (the "Basic Latin" Unicode block) into their ASCII
equivalents, if one exists. Example:
Converts alphabetic, numeric, and symbolic Unicode characters which are not in
the first 127 ASCII characters (the "Basic Latin" Unicode block) into their
ASCII equivalents, if one exists. For example, the filter changes `à` to `a`.

This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].

[[analysis-asciifolding-tokenfilter-analyze-ex]]
==== Example

The following <<indices-analyze,analyze API>> request demonstrates how the
ASCII folding token filter works.

[source,console]
--------------------------------------------------
GET /_analyze
{
"tokenizer" : "standard",
"filter" : ["asciifolding"],
"text" : "açaí à la carte"
}
--------------------------------------------------

The filter produces the following tokens:

[source,text]
--------------------------------------------------
[ acai, a, la, carte ]
--------------------------------------------------

/////////////////////
[source,console-result]
--------------------------------------------------
{
"tokens" : [
{
"token" : "acai",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "a",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "la",
"start_offset" : 7,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "carte",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
--------------------------------------------------
/////////////////////

[[analysis-asciifolding-tokenfilter-analyzer-ex]]
==== Add to an analyzer

The following <<indices-create-index,create index API>> request uses the
ASCII folding token filter to configure a new
<<analysis-custom-analyzer,custom analyzer>>.

[source,console]
--------------------------------------------------
Expand All @@ -13,7 +86,7 @@ PUT /asciifold_example
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"standard_asciifolding" : {
"tokenizer" : "standard",
"filter" : ["asciifolding"]
}
Expand All @@ -23,9 +96,20 @@ PUT /asciifold_example
}
--------------------------------------------------

Accepts `preserve_original` setting which defaults to false but if true
will keep the original token as well as emit the folded token. For
example:
[[analysis-asciifolding-tokenfilter-configure-parms]]
==== Configurable parameters

`preserve_original`::
(Optional, boolean)
If `true`, keep the original tokens and emit folded tokens.
Defaults to `false`.

[[analysis-asciifolding-tokenfilter-customize]]
==== Customize

To customize the ASCII folding token filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.

[source,console]
--------------------------------------------------
Expand All @@ -34,7 +118,7 @@ PUT /asciifold_example
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"standard_asciifolding" : {
"tokenizer" : "standard",
"filter" : ["my_ascii_folding"]
}
Expand Down