Skip to content

Commit 1256503

Browse files
committed
[DOCS] Add concepts section to analysis topic (#50801)
This helps the topic better match the structure of our machine learning docs, e.g. https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts' child page, but I plan to add other concepts, such as 'Index time vs. search time', with later PRs.
1 parent d42eb5a commit 1256503

File tree

3 files changed

+17
-14
lines changed

3 files changed

+17
-14
lines changed

docs/reference/analysis.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ looking for:
144144

145145
include::analysis/overview.asciidoc[]
146146

147-
include::analysis/anatomy.asciidoc[]
147+
include::analysis/concepts.asciidoc[]
148148

149149
include::analysis/testing.asciidoc[]
150150

docs/reference/analysis/anatomy.asciidoc

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[[analyzer-anatomy]]
2-
== Anatomy of an analyzer
2+
=== Anatomy of an analyzer
33

44
An _analyzer_ -- whether built-in or custom -- is just a package which
55
contains three lower-level building blocks: _character filters_,
@@ -10,8 +10,7 @@ blocks into analyzers suitable for different languages and types of text.
1010
Elasticsearch also exposes the individual building blocks so that they can be
1111
combined to define new <<analysis-custom-analyzer,`custom`>> analyzers.
1212

13-
[float]
14-
=== Character filters
13+
==== Character filters
1514

1615
A _character filter_ receives the original text as a stream of characters and
1716
can transform the stream by adding, removing, or changing characters. For
@@ -22,8 +21,7 @@ elements like `<b>` from the stream.
2221
An analyzer may have *zero or more* <<analysis-charfilters,character filters>>,
2322
which are applied in order.
2423

25-
[float]
26-
=== Tokenizer
24+
==== Tokenizer
2725

2826
A _tokenizer_ receives a stream of characters, breaks it up into individual
2927
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
@@ -37,9 +35,7 @@ the term represents.
3735

3836
An analyzer must have *exactly one* <<analysis-tokenizers,tokenizer>>.
3937

40-
41-
[float]
42-
=== Token filters
38+
==== Token filters
4339

4440
A _token filter_ receives the token stream and may add, remove, or change
4541
tokens. For example, a <<analysis-lowercase-tokenfilter,`lowercase`>> token
@@ -53,8 +49,4 @@ Token filters are not allowed to change the position or character offsets of
5349
each token.
5450

5551
An analyzer may have *zero or more* <<analysis-tokenfilters,token filters>>,
56-
which are applied in order.
57-
58-
59-
60-
52+
which are applied in order.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[[analysis-concepts]]
2+
== Text analysis concepts
3+
++++
4+
<titleabbrev>Concepts</titleabbrev>
5+
++++
6+
7+
This section explains the fundamental concepts of text analysis in {es}.
8+
9+
* <<analyzer-anatomy>>
10+
11+
include::anatomy.asciidoc[]

0 commit comments

Comments
 (0)