Skip to content

Conversation

@jureaky
Copy link
Contributor

@jureaky jureaky commented Mar 7, 2020

While reading a document for Keyword Tokenizer, I felt a bit confused why Keyword Tokenizer is needed.

Keyword Tokenizer is too simple to explain, which makes an example also simple.
However, as the document stated the other use-case(lower-casing email address) for users
to better understand the necessity of Keyword Tokenizer, I added the corresponding example.

This is AS-IS:

kwt-as-is

This is Added Example:

kwt-added-example

@cbuescher cbuescher added :Search Relevance/Analysis How text is split into tokens >docs General docs changes labels Mar 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

Comment on lines 55 to 56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the spacing to be more consistent with our other examples.

Let's also use example.com as the placeholder domain.

Suggested change
"filter": ["lowercase"],
"text": "john.SMITH@global-international.COM"
"filter": [ "lowercase" ],
"text": "john.SMITH@example.COM"

Comment on lines 67 to 69
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"token": "john.smith@global-international.com",
"start_offset": 0,
"end_offset": 35,
"token": "john.smith@example.com",
"start_offset": 0,
"end_offset": 22,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ john.smith@global-inetrnational.com ]
[ john.smith@example.com ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The above sentence would produce the following term:
The request produces the following token:

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jureaky.

The snippet looks good. I left some suggestions to fix typos and add some text to provide context.

I'd like to take another look after you've had a chance to go through those.

@jrodewig
Copy link
Contributor

@elasticmachine test this please

Comment on lines 47 to 48
Copy link
Contributor

@jrodewig jrodewig Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like GIthub ate my original suggestion. I'd add some lead-in text here to provide context. The original heading contained a typo.

Suggested change
[float]
=== Exmaple output(Lower-casing email address)
[discrete]
[[analysis-keyword-tokenizer-token-filters]]
=== Combine with token filters
You can combine the `keyword` tokenizer with token filters to normalise
structured data, such as product IDs or email addresses.
For example, the following <<indices-analyze,analyze API>> request uses
`keyword` tokenizer and <<analysis-lowercase-tokenfilter,`lowercase`>> filter to
convert an email address to lowercase.

@jureaky jureaky force-pushed the add-example-to-keyword-tokenizer-doc branch from 353b743 to e6059d6 Compare March 28, 2020 06:24
@jureaky jureaky force-pushed the add-example-to-keyword-tokenizer-doc branch from e6059d6 to 519f82d Compare March 28, 2020 06:30
@jureaky
Copy link
Contributor Author

jureaky commented Mar 28, 2020

@jrodewig I changed all you suggested and force-pushed after rebase... PTAL !

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jureaky. I'll get this merged and backported.

@jrodewig
Copy link
Contributor

@elasticmachine test this please

@jrodewig
Copy link
Contributor

Backport commits

master 4fe8ad3
7.x 21f362a
7.7 0813ec4
7.6 8983bc3

@jureaky jureaky deleted the add-example-to-keyword-tokenizer-doc branch March 30, 2020 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants