Skip to content

Commit 4dfc07c

Browse files
authored
[DOCS] Reformat lowercase token filter docs (#49935)
1 parent 997b96a commit 4dfc07c

File tree

1 file changed

+122
-12
lines changed

1 file changed

+122
-12
lines changed

docs/reference/analysis/tokenfilters/lowercase-tokenfilter.asciidoc

Lines changed: 122 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,135 @@
11
[[analysis-lowercase-tokenfilter]]
2-
=== Lowercase Token Filter
2+
=== Lowercase token filter
3+
++++
4+
<titleabbrev>Lowercase</titleabbrev>
5+
++++
36

4-
A token filter of type `lowercase` that normalizes token text to lower
5-
case.
7+
Changes token text to lowercase. For example, you can use the `lowercase` filter
8+
to change `THE Lazy DoG` to `the lazy dog`.
69

7-
Lowercase token filter supports Greek, Irish, and Turkish lowercase token
8-
filters through the `language` parameter. Below is a usage example in a
9-
custom analyzer
10+
In addition to a default filter, the `lowercase` token filter provides access to
11+
Lucene's language-specific lowercase filters for Greek, Irish, and Turkish.
12+
13+
[[analysis-lowercase-tokenfilter-analyze-ex]]
14+
==== Example
15+
16+
The following <<indices-analyze,analyze API>> request uses the default
17+
`lowercase` filter to change the `THE Quick FoX JUMPs` to lowercase:
18+
19+
[source,console]
20+
--------------------------------------------------
21+
GET _analyze
22+
{
23+
"tokenizer" : "standard",
24+
"filter" : ["lowercase"],
25+
"text" : "THE Quick FoX JUMPs"
26+
}
27+
--------------------------------------------------
28+
29+
The filter produces the following tokens:
30+
31+
[source,text]
32+
--------------------------------------------------
33+
[ the, quick, fox, jumps ]
34+
--------------------------------------------------
35+
36+
/////////////////////
37+
[source,console-result]
38+
--------------------------------------------------
39+
{
40+
"tokens" : [
41+
{
42+
"token" : "the",
43+
"start_offset" : 0,
44+
"end_offset" : 3,
45+
"type" : "<ALPHANUM>",
46+
"position" : 0
47+
},
48+
{
49+
"token" : "quick",
50+
"start_offset" : 4,
51+
"end_offset" : 9,
52+
"type" : "<ALPHANUM>",
53+
"position" : 1
54+
},
55+
{
56+
"token" : "fox",
57+
"start_offset" : 10,
58+
"end_offset" : 13,
59+
"type" : "<ALPHANUM>",
60+
"position" : 2
61+
},
62+
{
63+
"token" : "jumps",
64+
"start_offset" : 14,
65+
"end_offset" : 19,
66+
"type" : "<ALPHANUM>",
67+
"position" : 3
68+
}
69+
]
70+
}
71+
--------------------------------------------------
72+
/////////////////////
73+
74+
[[analysis-lowercase-tokenfilter-analyzer-ex]]
75+
==== Add to an analyzer
76+
77+
The following <<indices-create-index,create index API>> request uses the
78+
`lowercase` filter to configure a new
79+
<<analysis-custom-analyzer,custom analyzer>>.
1080

1181
[source,console]
1282
--------------------------------------------------
13-
PUT /lowercase_example
83+
PUT lowercase_example
84+
{
85+
"settings" : {
86+
"analysis" : {
87+
"analyzer" : {
88+
"whitespace_lowercase" : {
89+
"tokenizer" : "whitespace",
90+
"filter" : ["lowercase"]
91+
}
92+
}
93+
}
94+
}
95+
}
96+
--------------------------------------------------
97+
98+
[[analysis-lowercase-tokenfilter-configure-parms]]
99+
==== Configurable parameters
100+
101+
`language`::
102+
+
103+
--
104+
(Optional, string)
105+
Language-specific lowercase token filter to use. Valid values include:
106+
107+
`greek`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
108+
109+
`irish`::: Uses Lucene's http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
110+
111+
`turkish`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
112+
113+
If not specified, defaults to Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html[LowerCaseFilter].
114+
--
115+
116+
[[analysis-lowercase-tokenfilter-customize]]
117+
==== Customize
118+
119+
To customize the `lowercase` filter, duplicate it to create the basis
120+
for a new custom token filter. You can modify the filter using its configurable
121+
parameters.
122+
123+
For example, the following request creates a custom `lowercase` filter for the
124+
Greek language:
125+
126+
[source,console]
127+
--------------------------------------------------
128+
PUT custom_lowercase_example
14129
{
15130
"settings": {
16131
"analysis": {
17132
"analyzer": {
18-
"standard_lowercase_example": {
19-
"type": "custom",
20-
"tokenizer": "standard",
21-
"filter": ["lowercase"]
22-
},
23133
"greek_lowercase_example": {
24134
"type": "custom",
25135
"tokenizer": "standard",

0 commit comments

Comments
 (0)