Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 31 additions & 36 deletions docs/reference/analysis/analyzers/simple-analyzer.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,25 @@
<titleabbrev>Simple</titleabbrev>
++++

The `simple` analyzer breaks text into terms whenever it encounters a
character which is not a letter. All terms are lower cased.
The `simple` analyzer breaks text into tokens at any non-letter character, such
as numbers, spaces, hyphens and apostrophes, discards non-letter characters,
and changes uppercase to lowercase.

[float]
=== Example output
[[analysis-simple-analyzer-ex]]
==== Example

[source,console]
---------------------------
----
POST _analyze
{
"analyzer": "simple",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
---------------------------

/////////////////////
----

////
[source,console-result]
----------------------------
----
{
"tokens": [
{
Expand Down Expand Up @@ -104,52 +104,47 @@ POST _analyze
}
]
}
----------------------------

/////////////////////

----
////

The above sentence would produce the following terms:
The `simple` analyzer parses the sentence and produces the following
tokens:

[source,text]
---------------------------
----
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
---------------------------
----

[float]
=== Configuration
[[analysis-simple-analyzer-definition]]
==== Definition

The `simple` analyzer is not configurable.

[float]
=== Definition

The `simple` analzyer consists of:
The `simple` analyzer is defined by one tokenizer:

Tokenizer::
* <<analysis-lowercase-tokenizer,Lower Case Tokenizer>>
* <<analysis-lowercase-tokenizer, Lowercase Tokenizer>>

[[analysis-simple-analyzer-customize]]
==== Customize

If you need to customize the `simple` analyzer then you need to recreate
it as a `custom` analyzer and modify it, usually by adding token filters.
This would recreate the built-in `simple` analyzer and you can use it as
a starting point for further customization:
To customize the `simple` analyzer, duplicate it to create the basis for
a custom analyzer. This custom analyzer can be modified as required, usually by
adding token filters.

[source,console]
----------------------------------------------------
PUT /simple_example
----
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"rebuilt_simple": {
"my_custom_simple_analyzer": {
"tokenizer": "lowercase",
"filter": [ <1>
"filter": [ <1>
]
}
}
}
}
}
----------------------------------------------------
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: simple_example, first: simple, second: rebuilt_simple}\nendyaml\n/]
<1> You'd add any token filters here.
----
<1> Add token filters here.