Skip to content

Commit 4f2b092

Browse files
committed
[DOCS] Adds data frame eCommerce example (#372)
1 parent 9a4fbee commit 4f2b092

File tree

6 files changed

+253
-1
lines changed

6 files changed

+253
-1
lines changed
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
[[ecommerce-dataframes]]
4+
== Transforming your data with {dataframes}
5+
++++
6+
<titleabbrev>Transforming your data</titleabbrev>
7+
++++
8+
9+
beta[]
10+
11+
<<ml-dataframes,{dataframes-cap}>> enable you to retrieve information from an
12+
{es} index, transform it, and store it in another index. Let's use the
13+
{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can
14+
pivot and summarize your data with {dataframe-transforms}.
15+
16+
17+
. If the {es} {security-features} are enabled, obtain a user ID with sufficient
18+
privileges to complete these steps.
19+
+
20+
--
21+
You need `manage_data_frame_transforms` cluster privileges to preview and create
22+
{dataframe-transforms}. Members of the built-in `data_frame_transforms_admin`
23+
role have these privileges.
24+
25+
You also need `read` and `view_index_metadata` index privileges on the source
26+
index and `read`, `create_index`, and `index` privileges on the destination
27+
index.
28+
29+
For more information, see <<security-privileges>> and <<built-in-roles>>.
30+
--
31+
32+
. Choose your _source index_.
33+
+
34+
--
35+
In this example, we'll use the eCommerce orders sample data. If you're not
36+
already familiar with the `kibana_sample_data_ecommerce` index, use the
37+
*Revenue* dashboard in {kib} to explore the data. Consider what insights you
38+
might want to derive from this eCommerce data.
39+
--
40+
41+
. Play with various options for grouping and aggregating the data.
42+
+
43+
--
44+
For example, you might want to group the data by product ID and calculate the
45+
total number of sales for each product and its average price. Alternatively, you
46+
might want to look at the behavior of individual customers and calculate how
47+
much each customer spent in total and how many different categories of products
48+
they purchased. Or you might want to take the currencies or geographies into
49+
consideration. What are the most interesting ways you can transform and
50+
interpret this data?
51+
52+
_Pivoting_ your data involves using at least one field to group it and applying
53+
at least one aggregation. You can preview what the transformed data will look
54+
like, so go ahead and play with it!
55+
56+
For example, go to *Machine Learning* > *Data Frames* in {kib} and use the
57+
wizard to create a {dataframe}:
58+
59+
[role="screenshot"]
60+
image::images/ecommerce-pivot1.jpg["Creating a simple {dataframe} in {kib}"]
61+
62+
In this case, we grouped the data by customer ID and calculated the sum of
63+
products each customer purchased.
64+
65+
Let's add some more aggregations to learn more about our customers' orders. For
66+
example, let's calculate the total sum of their purchases, the maximum number of
67+
products that they purchased in a single order, and their total number of orders.
68+
We'll accomplish this by using the
69+
{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the
70+
`taxless_total_price` field, the
71+
{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the
72+
`total_quantity` field, and the
73+
{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation]
74+
on the `order_id` field:
75+
76+
[role="screenshot"]
77+
image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {dataframe} in {kib}"]
78+
79+
TIP: If you're interested in a subset of the data, you can optionally include a
80+
{ref}/search-request-query.html[query] element. In this example, we've filtered
81+
the data so that we're only looking at orders with a `currency` of `EUR`.
82+
Alternatively, we could group the data by that field too. If you want to use
83+
more complex queries, you can create your {dataframe} from a
84+
{kibana-ref}/save-open-search.html[saved search].
85+
86+
If you prefer, you can use the
87+
{ref}/preview-data-frame-transform.html[preview {dataframe-transforms} API]:
88+
89+
[source,js]
90+
--------------------------------------------------
91+
POST _data_frame/transforms/_preview
92+
{
93+
"source": {
94+
"index": "kibana_sample_data_ecommerce",
95+
"query": {
96+
"bool": {
97+
"filter": {
98+
"term": {"currency": "EUR"}
99+
}
100+
}
101+
}
102+
},
103+
"pivot": {
104+
"group_by": {
105+
"customer_id": {
106+
"terms": {
107+
"field": "customer_id"
108+
}
109+
}
110+
},
111+
"aggregations": {
112+
"total_quantity.sum": {
113+
"sum": {
114+
"field": "total_quantity"
115+
}
116+
},
117+
"taxless_total_price.sum": {
118+
"sum": {
119+
"field": "taxless_total_price"
120+
}
121+
},
122+
"total_quantity.max": {
123+
"max": {
124+
"field": "total_quantity"
125+
}
126+
},
127+
"order_id.cardinality": {
128+
"cardinality": {
129+
"field": "order_id"
130+
}
131+
}
132+
}
133+
}
134+
}
135+
--------------------------------------------------
136+
// CONSOLE
137+
// TEST[skip:set up sample data]
138+
--
139+
140+
. When you are satisfied with what you see in the preview, create the
141+
{dataframe-transform}.
142+
+
143+
--
144+
Supply a job ID and the name of the target (or _destination_) index. In {kib},
145+
you can choose to create and start the job, create it, or copy it to the
146+
clipboard.
147+
148+
If you prefer, you can use the
149+
{ref}/put-data-frame-transform.html[create {dataframe-transforms} API]. For
150+
example:
151+
152+
[source,js]
153+
--------------------------------------------------
154+
PUT _data_frame/transforms/ecommerce-customer-transform
155+
{
156+
"source": {
157+
"index": [
158+
"kibana_sample_data_ecommerce"
159+
],
160+
"query": {
161+
"bool": {
162+
"filter": {
163+
"term": {
164+
"currency": "EUR"
165+
}
166+
}
167+
}
168+
}
169+
},
170+
"dest": {
171+
"index": "ecommerce-customers"
172+
},
173+
"pivot": {
174+
"group_by": {
175+
"customer_id": {
176+
"terms": {
177+
"field": "customer_id"
178+
}
179+
}
180+
},
181+
"aggregations": {
182+
"total_quantity.sum": {
183+
"sum": {
184+
"field": "total_quantity"
185+
}
186+
},
187+
"taxless_total_price.sum": {
188+
"sum": {
189+
"field": "taxless_total_price"
190+
}
191+
},
192+
"total_quantity.max": {
193+
"max": {
194+
"field": "total_quantity"
195+
}
196+
},
197+
"order_id.cardinality": {
198+
"cardinality": {
199+
"field": "order_id"
200+
}
201+
}
202+
}
203+
}
204+
}
205+
--------------------------------------------------
206+
// CONSOLE
207+
// TEST[skip:setup kibana sample data]
208+
--
209+
210+
. Start the {dataframe-transform}.
211+
+
212+
--
213+
214+
TIP: Even though resource utilization is automatically adjusted based on the
215+
cluster load, a {dataframe-transform} increases search and indexing load on your
216+
cluster while it runs. When it reaches the end of the data in your index, it
217+
stops automatically. If you're experiencing an excessive load, however, you can
218+
stop it sooner.
219+
220+
You can start, stop, and manage {dataframe} jobs in {kib}:
221+
222+
[role="screenshot"]
223+
image::images/dataframe-jobs.jpg["Managing {dataframe} jobs in {kib}"]
224+
225+
Alternatively, you can use the
226+
{ref}/start-data-frame-transform.html[start {dataframe-transforms}] and
227+
{ref}/stop-data-frame-transform.html[stop {dataframe-transforms}] APIs. For
228+
example:
229+
230+
[source,js]
231+
--------------------------------------------------
232+
POST _data_frame/transforms/ecommerce-customer-transform/_start
233+
--------------------------------------------------
234+
// CONSOLE
235+
// TEST[skip:setup kibana sample data]
236+
237+
--
238+
239+
. Explore the data in your new index.
240+
+
241+
--
242+
For example, use the *Discover* application in {kib}:
243+
244+
[role="screenshot"]
245+
image::images/ecommerce-results.jpg["Exploring the new index in {kib}"]
246+
247+
--
248+
249+
TIP: If you do not want to keep the {dataframe-transform}, you can delete it in
250+
{kib} or use the
251+
{ref}/delete-data-frame-transform.html[delete {dataframe-transform} API]. When
252+
you delete a {dataframe-transform}, its destination index remains.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
include::dataframes.asciidoc[]
2-
//include::{es-repo-dir}/data-frames/pivoting.asciidoc[]
2+
include::ecommerce-example.asciidoc[]
33
include::api-quickref.asciidoc[]
230 KB
Loading
482 KB
Loading
558 KB
Loading
337 KB
Loading

0 commit comments

Comments
 (0)