-
Notifications
You must be signed in to change notification settings - Fork 255
[DOCS] Adds data frame eCommerce example #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,246 @@ | ||
| [role="xpack"] | ||
| [testenv="basic"] | ||
| [[ecommerce-dataframes]] | ||
| == Transforming your data with {dataframes} | ||
| ++++ | ||
| <titleabbrev>Transforming your data</titleabbrev> | ||
| ++++ | ||
|
|
||
| beta[] | ||
|
|
||
| <<ml-dataframes,{dataframes-cap}>> enable you to retrieve information from an | ||
| {es} index, transform it, and store it in another index. Let's use the | ||
| {kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can | ||
| pivot and summarize your data with {dataframe-transforms}. | ||
|
|
||
|
|
||
| . If the {es} {security-features} are enabled, obtain a user ID with sufficient | ||
| privileges to complete these steps. | ||
| + | ||
| -- | ||
| You need `manage_data_frame_transforms` cluster privileges to preview and create | ||
| {dataframe-transforms}. Members of the built-in `data_frame_transforms_admin` | ||
| role have these privileges. | ||
|
|
||
| You also need `read` and `view_index_metadata` index privileges on the source | ||
| index and `read`, `create_index`, and `index` privileges on the destination | ||
| index. | ||
|
|
||
| For more information, see <<security-privileges>> and <<built-in-roles>>. | ||
| -- | ||
|
|
||
| . Choose your _source index_. | ||
| + | ||
| -- | ||
| In this example, we'll use the eCommerce orders sample data. If you're not | ||
| already familiar with the `kibana_sample_data_ecommerce` index, use the | ||
| *Revenue* dashboard in {kib} to explore the data. Consider what insights you | ||
| might want to derive from this eCommerce data. | ||
| -- | ||
|
|
||
| . Play with various options for grouping and aggregating the data. | ||
| + | ||
| -- | ||
| For example, you might want to group the data by product ID and calculate the | ||
| total number of sales for each product and its average price. Alternatively, you | ||
| might want to look at the behavior of individual customers and calculate how | ||
| much each customer spent in total and how many different categories of products | ||
| they purchased. Or you might want to take the currencies or geographies into | ||
| consideration. What are the most interesting ways you can transform and | ||
| interpret this data? | ||
|
|
||
| _Pivoting_ your data involves using at least one field to group it and applying | ||
| at least one aggregation. You can preview what the transformed data will look | ||
| like, so go ahead and play with it! | ||
|
|
||
| For example, go to *Machine Learning* > *Data Frames* in {kib} and use the | ||
| wizard to create a {dataframe}: | ||
|
|
||
| [role="screenshot"] | ||
| image::images/ecommerce-pivot1.jpg["Creating a simple {dataframe} in {kib}"] | ||
|
|
||
| In this case, we grouped the data by customer ID and calculated the sum of | ||
| products each customer purchased. | ||
|
|
||
| Let's add some more aggregations to learn more about our customers' orders. For | ||
| example, let's calculate the total sum of their purchases, the maximum number of | ||
| products that they purchased in a single order, and their total number of orders. | ||
| We'll accomplish this by using the | ||
| {ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the | ||
| `taxless_total_price` field, the | ||
| {ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the | ||
| `total_quantity` field, and the | ||
| {ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation] | ||
| on the `order_id` field: | ||
|
|
||
| [role="screenshot"] | ||
| image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {dataframe} in {kib}"] | ||
|
|
||
| TIP: If you're interested in a subset of the data, you can optionally include a | ||
| {ref}/search-request-query.html[query] element. In this example, we've filtered | ||
| the data so that we're only looking at orders with a `currency` of `EUR`. | ||
| Alternatively, we could group the data by that field too. | ||
|
|
||
| If you prefer, you can use the | ||
| {ref}/preview-data-frame-transform.html[preview {dataframe-transforms} API]: | ||
|
|
||
| [source,js] | ||
| -------------------------------------------------- | ||
| POST _data_frame/transforms/_preview | ||
| { | ||
| "source": { | ||
| "index": "kibana_sample_data_ecommerce", | ||
| "query": { | ||
| "query_string": { | ||
| "query": "currency : EUR", | ||
| "default_operator": "AND" | ||
| } | ||
| } | ||
| }, | ||
| "pivot": { | ||
| "group_by": { | ||
| "customer_id": { | ||
| "terms": { | ||
| "field": "customer_id" | ||
| } | ||
| } | ||
| }, | ||
| "aggregations": { | ||
| "total_quantity.sum": { | ||
| "sum": { | ||
| "field": "total_quantity" | ||
| } | ||
| }, | ||
| "taxless_total_price.sum": { | ||
| "sum": { | ||
| "field": "taxless_total_price" | ||
| } | ||
| }, | ||
| "total_quantity.max": { | ||
| "max": { | ||
| "field": "total_quantity" | ||
| } | ||
| }, | ||
| "order_id.cardinality": { | ||
| "cardinality": { | ||
| "field": "order_id" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| -------------------------------------------------- | ||
| // CONSOLE | ||
| // TEST[skip:set up sample data] | ||
| -- | ||
|
|
||
| . When you are satisfied with what you see in the preview, create the | ||
| {dataframe-transform}. | ||
| + | ||
| -- | ||
| Supply a job ID and the name of the target (or _destination_) index. In {kib}, | ||
| you can choose to create and start the job, create it, or copy it to the | ||
| clipboard. | ||
|
|
||
| If you prefer, you can use the | ||
| {ref}/put-data-frame-transform.html[create {dataframe-transforms} API]. For | ||
| example: | ||
|
|
||
| [source,js] | ||
| -------------------------------------------------- | ||
| PUT _data_frame/transforms/ecommerce-customer-transform | ||
| { | ||
| "source": { | ||
| "index": [ | ||
| "kibana_sample_data_ecommerce" | ||
| ], | ||
| "query": { | ||
| "query_string": { | ||
| "query": "currency : EUR", | ||
| "default_operator": "AND" | ||
| } | ||
| } | ||
| }, | ||
| "dest": { | ||
| "index": "ecommerce-customers" | ||
| }, | ||
| "pivot": { | ||
| "group_by": { | ||
| "customer_id": { | ||
| "terms": { | ||
| "field": "customer_id" | ||
| } | ||
| } | ||
| }, | ||
| "aggregations": { | ||
| "total_quantity.sum": { | ||
| "sum": { | ||
| "field": "total_quantity" | ||
| } | ||
| }, | ||
| "taxless_total_price.sum": { | ||
| "sum": { | ||
| "field": "taxless_total_price" | ||
| } | ||
| }, | ||
| "total_quantity.max": { | ||
| "max": { | ||
| "field": "total_quantity" | ||
| } | ||
| }, | ||
| "order_id.cardinality": { | ||
| "cardinality": { | ||
| "field": "order_id" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| -------------------------------------------------- | ||
| // CONSOLE | ||
| // TEST[skip:setup kibana sample data] | ||
| -- | ||
|
|
||
| . Start the {dataframe-transform}. | ||
| + | ||
| -- | ||
|
|
||
| TIP: Even though resource utilization is automatically adjusted based on the | ||
| cluster load, a {dataframe-transform} increases search and indexing load on your | ||
| cluster while it runs. When it reaches the end of the data in your index, it | ||
| stops automatically. If you're experiencing an excessive load, however, you can | ||
| stop it sooner. | ||
|
|
||
| You can start, stop, and manage {dataframe} jobs in {kib}: | ||
|
|
||
| [role="screenshot"] | ||
| image::images/dataframe-jobs.jpg["Managing {dataframe} jobs in {kib}"] | ||
|
|
||
| Alternatively, you can use the | ||
| {ref}/start-data-frame-transform.html[start {dataframe-transforms}] and | ||
| {ref}/stop-data-frame-transform.html[stop {dataframe-transforms}] APIs. For | ||
| example: | ||
|
|
||
| [source,js] | ||
| -------------------------------------------------- | ||
| POST _data_frame/transforms/ecommerce-customer-transform/_start | ||
| -------------------------------------------------- | ||
| // CONSOLE | ||
| // TEST[skip:setup kibana sample data] | ||
|
|
||
| -- | ||
|
|
||
| . Explore the data in your new index. | ||
| + | ||
| -- | ||
| For example, use the *Discover* application in {kib}: | ||
|
|
||
| [role="screenshot"] | ||
| image::images/ecommerce-results.jpg["Exploring the new index in {kib}"] | ||
|
|
||
| -- | ||
|
|
||
| TIP: If you do not want to keep the {dataframe-transform}, you can delete it in | ||
| {kib} or use the | ||
| {ref}/delete-data-frame-transform.html[delete {dataframe-transform} API]. When | ||
| you delete a {dataframe-transform}, its destination index remains. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| include::dataframes.asciidoc[] | ||
| //include::{es-repo-dir}/data-frames/pivoting.asciidoc[] | ||
| include::ecommerce-example.asciidoc[] | ||
| include::api-quickref.asciidoc[] |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to make this an actual filter
Same goes for other queries used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @benwtrent I've updated the preview and create data frame transform API examples to use that filter. I've also updated the query-related tip to recommend that folks use saved searches for more complex queries in Kibana.