-
Notifications
You must be signed in to change notification settings - Fork 627
[Star-tree] Support for nested aggs & Removing experimental flag. #10132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
347e6f3
53e2b71
46ad312
741f811
50dbfe2
6848b61
2b32454
e8c9292
c28f6f7
ee8d3c9
764d2e9
bf4f301
23a0ba2
eb55f86
1e8b57c
788d8b7
837516b
428b83b
996322f
fd84526
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -7,74 +7,117 @@ nav_order: 54 | |||
|
|
||||
| # Star-tree index | ||||
|
|
||||
| This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). | ||||
| {: .warning} | ||||
| A _star-tree index_ is a multi-field index that improves the performance of aggregations by precomputing metric values for combinations of dimension fields. | ||||
|
|
||||
| A star-tree index is a multi-field index that improves the performance of aggregations. | ||||
| Once you enable star-tree indexes, OpenSearch automatically builds and uses star-tree indexes to optimize aggregations if the filter fields match the defined dimensions and the aggregation fields match the defined metrics in the star-tree configuration. No changes to your query syntax or request parameters are required. | ||||
sandeshkr419 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
|
|
||||
| OpenSearch will automatically use a star-tree index to optimize aggregations if the queried fields are part of dimension fields and the aggregations are on star-tree metric fields. No changes are required in the query syntax or the request parameters. | ||||
|
|
||||
| ## When to use a star-tree index | ||||
|
|
||||
| A star-tree index can be used to perform faster aggregations. Consider the following criteria and features when deciding to use a star-tree index: | ||||
| Use a star-tree index when you want to speed up aggregations: | ||||
|
|
||||
| - Star-tree indexes natively support multi-field aggregations. | ||||
| - Star-tree indexes are created in real time as part of the indexing process, so the data in a star-tree will always be up to date. | ||||
| - A star-tree index consolidates data, increasing index paging efficiency and using less IO for search queries. | ||||
| - Star-tree indexes are created in real time as part of the indexing process, so the data in a star-tree is always current. | ||||
| - A star-tree index consolidates data to improve paging efficiency and reduce disk I/O during search queries. | ||||
|
||||
|
|
||||
| ## Limitations | ||||
|
|
||||
| Star-tree indexes have the following limitations: | ||||
| ## Star-tree index structure | ||||
|
|
||||
| - A star-tree index should only be enabled on indexes whose data is not updated or deleted because updates and deletions are not accounted for in a star-tree index. To enforce this policy and use star-tree indexes, set the `index.append_only.enabled` setting to `true`. | ||||
| - A star-tree index can be used for aggregation queries only if the queried fields are a subset of the star-tree's dimensions and the aggregated fields are a subset of the star-tree's metrics. | ||||
| - After a star-tree index is enabled, it cannot be disabled. In order to disable a star-tree index, the data in the index must be reindexed without the star-tree mapping. Furthermore, changing a star-tree configuration will also require a reindex operation. | ||||
| - [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported. | ||||
| - Only [limited queries and aggregations](#supported-queries-and-aggregations) are supported. Support for more features will be added in future versions. | ||||
| - The cardinality of the dimensions should not be very high (as with `_id` fields). Higher cardinality leads to increased storage usage and query latency. | ||||
| A star-tree index organizes and aggregates data across combinations of dimension fields and precomputes metric values for those combinations during ingestion. This structure enables OpenSearch to process aggregation queries quickly without scanning every document. | ||||
sandeshkr419 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
|
|
||||
| ## Star-tree index structure | ||||
| The following is an example star-tree configuration: | ||||
|
|
||||
| The following image illustrates a standard star-tree index structure. | ||||
| ```json | ||||
| "ordered_dimensions": [ | ||||
| { | ||||
| "name": "status" | ||||
| }, | ||||
| { | ||||
| "name": "port" | ||||
| } | ||||
| ], | ||||
| "metrics": [ | ||||
| { | ||||
| "name": "size", | ||||
| "stats": [ | ||||
| "sum" | ||||
| ] | ||||
| }, | ||||
| { | ||||
| "name": "latency", | ||||
| "stats": [ | ||||
| "avg" | ||||
| ] | ||||
| } | ||||
| ] | ||||
| ``` | ||||
|
|
||||
| <img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A star-tree index containing two dimensions and two metrics" width="700"> | ||||
| This configuration defines the following: | ||||
|
|
||||
| Sorted and aggregated star-tree documents are backed by `doc_values` in an index. The columnar data found in `doc_values` is stored using the following properties: | ||||
| * Two dimension fields: `status` and `port`. The `ordered_dimension` field specifies how data is sorted (first, by `status`, then by `port`). | ||||
natebower marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
| * Two metric fields: `size` and `latency` with their corresponding aggregations (`sum` and `avg`). For each unique dimension combination, metric values (`sum(size)` and `avg(latency)`) are pre-aggregated and stored in the star-tree structure. | ||||
natebower marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
|
|
||||
| - The values are sorted based on the fields set in the `ordered_dimension` setting. In the preceding image, the dimensions are determined by the `status` setting and then by the `port` for each status. | ||||
| - For each unique dimension/value combination, the aggregated values for all the metrics, such as `avg(size)` and `count(requests)`, are precomputed during ingestion. | ||||
| Based on this configuration, OpenSearch creates a star-tree index structure. Each node in the tree corresponds to a value (or wildcard `*`) for a dimension. At query time, OpenSearch traverses the tree based on which dimension values are provided in the query. | ||||
|
|
||||
| ### Leaf nodes | ||||
|
|
||||
| Each node in a star-tree index points to a range of star-tree documents. Nodes can be further split into child nodes based on the [max_leaf_docs configuration]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/#star-tree-index-configuration-options). The number of documents that a leaf node points to is less than or equal to the value set in `max_leaf_docs`. This ensures that the maximum number of documents that need to traverse nodes to derive an aggregated value is at most the number of `max_leaf_docs`, which provides predictable latency. | ||||
| Leaf nodes contain the precomputed metric aggregations for specific combinations of dimensions. These are stored as doc values and referenced by star-tree nodes. | ||||
|
|
||||
| The `max_leaf_docs` setting controls how many documents each leaf node can reference, which helps keep query latency predictable by limiting how many documents are scanned for any given node. | ||||
|
|
||||
| ### Star nodes | ||||
|
|
||||
| A star node contains the aggregated data of all the other nodes for a particular dimension, acting as a "catch-all" node. When a star node is found in a dimension, that dimension is skipped during aggregation. This groups together all values of that dimension and allows a query to skip non-competitive nodes when fetching the aggregated value of a particular field. | ||||
| A _star node_ (marked as `*` in the following diagram) aggregates all values for a particular dimension. It allows OpenSearch to process partial-dimension queries more efficiently by skipping over non-filtered dimensions. | ||||
|
|
||||
| The star-tree index structure diagram contains the following three examples demonstrating how a query behaves when retrieving aggregations from nodes in the star-tree: | ||||
| For example, if a query filters on `port` but not `status`, OpenSearch can use a star node that aggregates data for all status values. | ||||
|
|
||||
| - **Blue**: In a `terms` query that searches for the average request size aggregation, the `port` equals `8443` and the status equals `200`. Because the query contains values in both the `status` and `port` dimensions, the query traverses status node `200` and returns the aggregations from child node `8443`. | ||||
| - **Green**: In a `term` query that searches for the number of aggregation requests, the `status` equals `200`. Because the query only contains a value from the `status` dimension, the query traverses the `200` node's child star node, which contains the aggregated value of all the `port` child nodes. | ||||
| - **Red**: In a `term` query that searches for the average request size aggregation, the port equals `5600`. Because the query does not contain a value from the `status` dimension, the query traverses a star node and returns the aggregated result from the `5600` child node. | ||||
| ### How queries use the star-tree | ||||
|
|
||||
| Support for the `Terms` query will be added in a future version. For more information, see [GitHub issue #15257](https://github.com/opensearch-project/OpenSearch/issues/15257). | ||||
| {: .note} | ||||
| The following diagram shows a star-tree index created for this example and three example query paths. | ||||
|
|
||||
| <img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A star-tree index containing two dimensions and two metrics"> | ||||
|
|
||||
| The colored arrows show three query examples: | ||||
|
|
||||
| * **Blue arrow**: Multi-term query with metric aggregation | ||||
| The query filters on both `status = 200` and `port = 5600` and calculates the sum of request sizes. | ||||
|
|
||||
| * OpenSearch follows the path: `Root → 200 → 5600` | ||||
natebower marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
| * It retrieves the metric from Doc ID 1, where `Sum(size) = 988` | ||||
|
|
||||
| * **Green arrow**: Single-term query with metric aggregation | ||||
| The query filters on `status = 200` only and computes the average request latency. | ||||
|
|
||||
| * OpenSearch follows the path: `Root → 200 → *` | ||||
natebower marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
| * It retrieves the metric from Doc ID 5, where `Avg(latency) = 70` | ||||
|
|
||||
| * **Red arrow**: Single-term query with metric aggregation | ||||
| The query filters on `port = 8443` only and calculates the sum of request sizes. | ||||
|
|
||||
| * OpenSearch follows the path: `Root → * → 8443` | ||||
natebower marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
| * It retrieves the metric from Doc ID 7, where `Sum(size) = 1111` | ||||
|
|
||||
| These examples show how OpenSearch selects the shortest path in the star-tree and uses pre-aggregated values to process queries efficiently. | ||||
|
|
||||
| ## Limitations | ||||
|
|
||||
| Star-tree indexes have the following limitations: | ||||
|
|
||||
| - Enable a star-tree index only for indexes whose data is not updated or deleted: star-tree indexes do not support updates or deletions. To enforce this policy, set `index.append_only.enabled` to `true` for the index you want to enable star-tree index on. | ||||
| - Use a star-tree index for aggregation queries only if the queried fields are a subset of the star-tree's dimensions and the aggregated fields are a subset of the star-tree's metrics. | ||||
| - Once enabled, a star-tree index cannot be disabled. In order to disable a star-tree index, the data in the index must be reindexed without the star-tree mapping. Changing a star-tree configuration also requires reindexing data. | ||||
|
||||
| - [Array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported. | ||||
sandeshkr419 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
| - Only [specific queries and aggregations](#supported-queries-and-aggregations) are supported. | ||||
| - Avoid using high-cardinality fields like `_id` as dimensions, as they can significantly increase storage usage and query latency. | ||||
|
|
||||
| ## Enabling a star-tree index | ||||
|
|
||||
| To use a star-tree index, modify the following settings: | ||||
|
|
||||
| - Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). | ||||
| - Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [Configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). | ||||
| - Set the `indices.composite_index.star_tree.enabled` setting to `true`. For more information, see [Dynamic settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#dynamic-settings). | ||||
sandeshkr419 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
| - Set the `indices.composite_index.star_tree.enabled` setting to `true`. For more information, see [Dynamic settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#dynamic-settings). |
natebower marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
sandeshkr419 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
sandeshkr419 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws We should not remove this section. We need to still give the user an option to disable index search via star-tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll incorporate it into the "enabling" section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we kindly keep this a separate section. Today with feature flag removal, and cluster setting pointing to true, its by default enabled. So its not coming out clearly that this can be done. Also we have added index level setting so adding the info in separate section will be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
_star-treeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for
index_There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its putting the words in italic font :
_italic font_