Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid refreshing search-idle shards that don't yield results after query rewrite #95541

Closed
Tracked by #95776
martijnvg opened this issue Apr 25, 2023 · 4 comments · Fixed by #96161
Closed
Tracked by #95776

Avoid refreshing search-idle shards that don't yield results after query rewrite #95541

martijnvg opened this issue Apr 25, 2023 · 4 comments · Fixed by #96161
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team

Comments

@martijnvg
Copy link
Member

Many search requests have the following structure:

...
"query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "match_phrase": {
              "data_stream.dataset": "kubernetes.container"
            }
          },
          {
            "range": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "gte": "...",
                "lte": "...."
              }
            }
          }
        ],
        "should": [],
        "must_not": []
      }
    }
...

The index pattern matches (metrics-*) matches all metric data streams, but the match_phrase query on the data_stream.dataset field, which is a constant keyword field, only matches with one specific data stream.

Before query rewriting either in the can_match or query phases, shards that are search-idle get refreshed. This increases the query time significantly. Many o11y use cases rely on the default refresh behaviour. Which is the schedule a refresh every second when a shard is search active and don't schedule any refreshes when a shard is search-idle, this to favour indexing performance.

The refresh that occurs before the query rewrite should not occur on shards that don't match with the required filter clause on the data_stream.dataset constant keyword field. That is the goal of this issue..

@martijnvg martijnvg added >enhancement :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics labels Apr 25, 2023
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Apr 25, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@felixbarny
Copy link
Member

For non-TSDB data streams (such as logs and traces), can we exclude refreshing shards that don't contain data for the selected time range?

If we can't, how big is the overhead of refreshing a lot of non write backing indices? I suppose the refresh itself will be quick but it may add a lot of tasks to the thread pool that's responsible for doing refreshes. Is that correct?

@martijnvg
Copy link
Member Author

For non-TSDB data streams (such as logs and traces), can we exclude refreshing shards that don't contain data for the selected time range?

For non tsdb data streams we don't keep track what time range a backing index represents (like with index.time_series.strart_time and index.time_series.end_time index settings with tsdb). So we can't do this now. But I think we apply this mechanism also to other other data sources that are currently not using tsdb.

If we can't, how big is the overhead of refreshing a lot of non write backing indices? I suppose the refresh itself will be quick but it may add a lot of tasks to the thread pool that's responsible for doing refreshes. Is that correct?

The refresh on shards that don't receive write is very light. In fact I think it is just a no-op. Prior to executing the refresh checks are in place to avoid performing a refresh if it isn't needed. So I think from a refresh perspective it does;t matter that match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants