Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/en/stack/ml/anomaly-detection/categorization-data.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[role="xpack"]
[[ml-datatypes-categorization]]
=== Data types and categorization

Categorization is a {ml} process that observes the static parts of the data,
Comment thread
szabosteve marked this conversation as resolved.
Outdated
clusters similar data together, and classifies them into categories. However,
categorization doesn't work equally efficient on different data types. It works
Comment thread
szabosteve marked this conversation as resolved.
Outdated
best on machine-written messages and application outputs, typically on data that
consists of repeated elements, for example log messages for the purpose of
system troubleshooting. Log categorization groups unstructured log messages into
categories, then you can use {anomaly-detect} to model and identify rare or
unusual counts of log message categories. For more information about the
process, see
{ml-docs}/ml-configuring-categories.html[Categorizing log messages].

The reason why categorization works best on data like log messages is that they
Comment thread
szabosteve marked this conversation as resolved.
Outdated
have structural similarities that can be recognized easily by the {ml} model.
Comment thread
szabosteve marked this conversation as resolved.
Outdated
Complete sentences in human communication or literary text (for example emails,
wiki pages, prose, or other human generated content) can be extremely diverse in
structure, consequently categorization may provide poor results on such data.
Comment thread
szabosteve marked this conversation as resolved.
Outdated
For example, the categorization job would create so many categories that
couldn't be handled effectively.
Comment thread
szabosteve marked this conversation as resolved.
Outdated
1 change: 1 addition & 0 deletions docs/en/stack/ml/anomaly-detection/overview.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ include::analyzing.asciidoc[]

include::forecasting.asciidoc[]

include::categorization-data.asciidoc[]