|
| 1 | +[role="xpack"] |
| 2 | +[testenv="basic"] |
| 3 | +[[ecommerce-dataframes]] |
| 4 | +== Transforming your data with {dataframes} |
| 5 | +++++ |
| 6 | +<titleabbrev>Transforming your data</titleabbrev> |
| 7 | +++++ |
| 8 | + |
| 9 | +beta[] |
| 10 | + |
| 11 | +<<ml-dataframes,{dataframes-cap}>> enable you to retrieve information from an |
| 12 | +{es} index, transform it, and store it in another index. Let's use the |
| 13 | +{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can |
| 14 | +pivot and summarize your data with {dataframe-transforms}. |
| 15 | + |
| 16 | + |
| 17 | +. If the {es} {security-features} are enabled, obtain a user ID with sufficient |
| 18 | +privileges to complete these steps. |
| 19 | ++ |
| 20 | +-- |
| 21 | +You need `manage_data_frame_transforms` cluster privileges to preview and create |
| 22 | +{dataframe-transforms}. Members of the built-in `data_frame_transforms_admin` |
| 23 | +role have these privileges. |
| 24 | + |
| 25 | +You also need `read` and `view_index_metadata` index privileges on the source |
| 26 | +index and `read`, `create_index`, and `index` privileges on the destination |
| 27 | +index. |
| 28 | + |
| 29 | +For more information, see <<security-privileges>> and <<built-in-roles>>. |
| 30 | +-- |
| 31 | + |
| 32 | +. Choose your _source index_. |
| 33 | ++ |
| 34 | +-- |
| 35 | +In this example, we'll use the eCommerce orders sample data. If you're not |
| 36 | +already familiar with the `kibana_sample_data_ecommerce` index, use the |
| 37 | +*Revenue* dashboard in {kib} to explore the data. Consider what insights you |
| 38 | +might want to derive from this eCommerce data. |
| 39 | +-- |
| 40 | + |
| 41 | +. Play with various options for grouping and aggregating the data. |
| 42 | ++ |
| 43 | +-- |
| 44 | +For example, you might want to group the data by product ID and calculate the |
| 45 | +total number of sales for each product and its average price. Alternatively, you |
| 46 | +might want to look at the behavior of individual customers and calculate how |
| 47 | +much each customer spent in total and how many different categories of products |
| 48 | +they purchased. Or you might want to take the currencies or geographies into |
| 49 | +consideration. What are the most interesting ways you can transform and |
| 50 | +interpret this data? |
| 51 | + |
| 52 | +_Pivoting_ your data involves using at least one field to group it and applying |
| 53 | +at least one aggregation. You can preview what the transformed data will look |
| 54 | +like, so go ahead and play with it! |
| 55 | + |
| 56 | +For example, go to *Machine Learning* > *Data Frames* in {kib} and use the |
| 57 | +wizard to create a {dataframe}: |
| 58 | + |
| 59 | +[role="screenshot"] |
| 60 | +image::images/ecommerce-pivot1.jpg["Creating a simple {dataframe} in {kib}"] |
| 61 | + |
| 62 | +In this case, we grouped the data by customer ID and calculated the sum of |
| 63 | +products each customer purchased. |
| 64 | + |
| 65 | +Let's add some more aggregations to learn more about our customers' orders. For |
| 66 | +example, let's calculate the total sum of their purchases, the maximum number of |
| 67 | +products that they purchased in a single order, and their total number of orders. |
| 68 | +We'll accomplish this by using the |
| 69 | +{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the |
| 70 | +`taxless_total_price` field, the |
| 71 | +{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the |
| 72 | +`total_quantity` field, and the |
| 73 | +{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation] |
| 74 | +on the `order_id` field: |
| 75 | + |
| 76 | +[role="screenshot"] |
| 77 | +image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {dataframe} in {kib}"] |
| 78 | + |
| 79 | +TIP: If you're interested in a subset of the data, you can optionally include a |
| 80 | +{ref}/search-request-query.html[query] element. In this example, we've filtered |
| 81 | +the data so that we're only looking at orders with a `currency` of `EUR`. |
| 82 | +Alternatively, we could group the data by that field too. If you want to use |
| 83 | +more complex queries, you can create your {dataframe} from a |
| 84 | +{kibana-ref}/save-open-search.html[saved search]. |
| 85 | + |
| 86 | +If you prefer, you can use the |
| 87 | +{ref}/preview-data-frame-transform.html[preview {dataframe-transforms} API]: |
| 88 | + |
| 89 | +[source,js] |
| 90 | +-------------------------------------------------- |
| 91 | +POST _data_frame/transforms/_preview |
| 92 | +{ |
| 93 | + "source": { |
| 94 | + "index": "kibana_sample_data_ecommerce", |
| 95 | + "query": { |
| 96 | + "bool": { |
| 97 | + "filter": { |
| 98 | + "term": {"currency": "EUR"} |
| 99 | + } |
| 100 | + } |
| 101 | + } |
| 102 | + }, |
| 103 | + "pivot": { |
| 104 | + "group_by": { |
| 105 | + "customer_id": { |
| 106 | + "terms": { |
| 107 | + "field": "customer_id" |
| 108 | + } |
| 109 | + } |
| 110 | + }, |
| 111 | + "aggregations": { |
| 112 | + "total_quantity.sum": { |
| 113 | + "sum": { |
| 114 | + "field": "total_quantity" |
| 115 | + } |
| 116 | + }, |
| 117 | + "taxless_total_price.sum": { |
| 118 | + "sum": { |
| 119 | + "field": "taxless_total_price" |
| 120 | + } |
| 121 | + }, |
| 122 | + "total_quantity.max": { |
| 123 | + "max": { |
| 124 | + "field": "total_quantity" |
| 125 | + } |
| 126 | + }, |
| 127 | + "order_id.cardinality": { |
| 128 | + "cardinality": { |
| 129 | + "field": "order_id" |
| 130 | + } |
| 131 | + } |
| 132 | + } |
| 133 | + } |
| 134 | +} |
| 135 | +-------------------------------------------------- |
| 136 | +// CONSOLE |
| 137 | +// TEST[skip:set up sample data] |
| 138 | +-- |
| 139 | + |
| 140 | +. When you are satisfied with what you see in the preview, create the |
| 141 | +{dataframe-transform}. |
| 142 | ++ |
| 143 | +-- |
| 144 | +Supply a job ID and the name of the target (or _destination_) index. In {kib}, |
| 145 | +you can choose to create and start the job, create it, or copy it to the |
| 146 | +clipboard. |
| 147 | + |
| 148 | +If you prefer, you can use the |
| 149 | +{ref}/put-data-frame-transform.html[create {dataframe-transforms} API]. For |
| 150 | +example: |
| 151 | + |
| 152 | +[source,js] |
| 153 | +-------------------------------------------------- |
| 154 | +PUT _data_frame/transforms/ecommerce-customer-transform |
| 155 | +{ |
| 156 | + "source": { |
| 157 | + "index": [ |
| 158 | + "kibana_sample_data_ecommerce" |
| 159 | + ], |
| 160 | + "query": { |
| 161 | + "bool": { |
| 162 | + "filter": { |
| 163 | + "term": { |
| 164 | + "currency": "EUR" |
| 165 | + } |
| 166 | + } |
| 167 | + } |
| 168 | + } |
| 169 | + }, |
| 170 | + "dest": { |
| 171 | + "index": "ecommerce-customers" |
| 172 | + }, |
| 173 | + "pivot": { |
| 174 | + "group_by": { |
| 175 | + "customer_id": { |
| 176 | + "terms": { |
| 177 | + "field": "customer_id" |
| 178 | + } |
| 179 | + } |
| 180 | + }, |
| 181 | + "aggregations": { |
| 182 | + "total_quantity.sum": { |
| 183 | + "sum": { |
| 184 | + "field": "total_quantity" |
| 185 | + } |
| 186 | + }, |
| 187 | + "taxless_total_price.sum": { |
| 188 | + "sum": { |
| 189 | + "field": "taxless_total_price" |
| 190 | + } |
| 191 | + }, |
| 192 | + "total_quantity.max": { |
| 193 | + "max": { |
| 194 | + "field": "total_quantity" |
| 195 | + } |
| 196 | + }, |
| 197 | + "order_id.cardinality": { |
| 198 | + "cardinality": { |
| 199 | + "field": "order_id" |
| 200 | + } |
| 201 | + } |
| 202 | + } |
| 203 | + } |
| 204 | +} |
| 205 | +-------------------------------------------------- |
| 206 | +// CONSOLE |
| 207 | +// TEST[skip:setup kibana sample data] |
| 208 | +-- |
| 209 | + |
| 210 | +. Start the {dataframe-transform}. |
| 211 | ++ |
| 212 | +-- |
| 213 | + |
| 214 | +TIP: Even though resource utilization is automatically adjusted based on the |
| 215 | +cluster load, a {dataframe-transform} increases search and indexing load on your |
| 216 | +cluster while it runs. When it reaches the end of the data in your index, it |
| 217 | +stops automatically. If you're experiencing an excessive load, however, you can |
| 218 | +stop it sooner. |
| 219 | + |
| 220 | +You can start, stop, and manage {dataframe} jobs in {kib}: |
| 221 | + |
| 222 | +[role="screenshot"] |
| 223 | +image::images/dataframe-jobs.jpg["Managing {dataframe} jobs in {kib}"] |
| 224 | + |
| 225 | +Alternatively, you can use the |
| 226 | +{ref}/start-data-frame-transform.html[start {dataframe-transforms}] and |
| 227 | +{ref}/stop-data-frame-transform.html[stop {dataframe-transforms}] APIs. For |
| 228 | +example: |
| 229 | + |
| 230 | +[source,js] |
| 231 | +-------------------------------------------------- |
| 232 | +POST _data_frame/transforms/ecommerce-customer-transform/_start |
| 233 | +-------------------------------------------------- |
| 234 | +// CONSOLE |
| 235 | +// TEST[skip:setup kibana sample data] |
| 236 | + |
| 237 | +-- |
| 238 | + |
| 239 | +. Explore the data in your new index. |
| 240 | ++ |
| 241 | +-- |
| 242 | +For example, use the *Discover* application in {kib}: |
| 243 | + |
| 244 | +[role="screenshot"] |
| 245 | +image::images/ecommerce-results.jpg["Exploring the new index in {kib}"] |
| 246 | + |
| 247 | +-- |
| 248 | + |
| 249 | +TIP: If you do not want to keep the {dataframe-transform}, you can delete it in |
| 250 | +{kib} or use the |
| 251 | +{ref}/delete-data-frame-transform.html[delete {dataframe-transform} API]. When |
| 252 | +you delete a {dataframe-transform}, its destination index remains. |
0 commit comments