Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions docs/en/stack/ml/dataframes.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[[ml-dataframes]]
=== Data frames
++++
<titleabbrev>Dataframes</titleabbrev>
++++

A {dataframe} is a transformation of a dataset by certain rules defined during
the creation of the {dataframe}. You can think of a it like a
spreadsheet or a data table.

{es} datasets consist of individual documents that have fields and
values in each field. This architecture makes hard to run analyses that require
reorganized or summarized fields of the dataset. {ml-cap} analyses need clean
and transformed data. That is the point where {dataframe}s come into play.

To transform the data into a {dataframe}, you need to define a pivot. During
pivoting, you create a set of features that transform the dataset into a
different, more digestible format to make calculations on your data. Pivoting
results in a summary of your dataset (which is a {dataframe}).

When you define a pivot, you select one or more fields that your dataset will
be grouped by. You can select *categorical fields* (fields contain strings as
values) for grouping. You can select *numerical fields* for grouping only as
histograms.

examlpe - to do

IMPORTANT: Creating a {dataframe} leaves your dataset intact. It does not amend
the dataset itself.
1 change: 1 addition & 0 deletions docs/en/stack/ml/overview.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ include::buckets.asciidoc[]
include::calendars.asciidoc[]
include::rules.asciidoc[]
include::architecture.asciidoc[]
include::dataframes.asciidoc[]