Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose range values of other fields to coordinating nodes #81457

Closed
dnhatn opened this issue Dec 7, 2021 · 4 comments
Closed

Expose range values of other fields to coordinating nodes #81457

dnhatn opened this issue Dec 7, 2021 · 4 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@dnhatn
Copy link
Member

dnhatn commented Dec 7, 2021

Today, we use the range values of the @timestamp field exposed to coordinating nodes (via the cluster states) to skip shards that won't match search queries. This is important for searchable snapshots and frozen indices. There's a need to expose the range values of other fields when users use them (and with @timestamp) to filter data. Another optimization is to track the range values of actively indexing shards (see #78776 (comment)) so we can efficiently skip those shards in the can_match phase and avoid trigger refreshing on them.

@dnhatn dnhatn added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Dec 7, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna
Copy link
Member

javanna commented Apr 20, 2022

I am a bit nervous about allowing to store range values of arbitrary fields in the cluster state. Especially as the main reason would be to shortcut queries that don't filter on @timestamp. Even if the coordinating node is not able to determine whether a shard can be skipped or not, the can_match phase on the shard will be lightweight, hence hitting the frozen tier should not necessarily be seen as a problem.

@a03nikki
Copy link
Contributor

a03nikki commented Jul 28, 2022

Could we at least add the event.ingested ECS date field too?

Especially considering our security documentation recommends and the prebuilt detection rules often use it.

From Configure advanced rule settings (optional):

k. Timestamp override (optional): Select a source event timestamp field. When selected, the rule’s query uses the selected field, instead of the default @timestamp field, to search for alerts. This can help reduce missing alerts due to network or server outages. Specifically, if your ingest pipeline adds a timestamp when events are sent to Elasticsearch, this avoids missing alerts due to ingestion delays.

TIP: These Filebeat modules have an event.ingested timestamp field that can be used instead of the default @timestamp field: Microsoft and Google Workspace.

From Troubleshoot ingestion pipeline delay:

You can reduce the number of missed alerts due to ingestion pipeline delay by specifying the Timestamp override field value to event.ingested in advanced settings during rule creation or editing. The detection engine uses the value from the event.ingested field as the timestamp when executing the rule.

@javanna
Copy link
Member

javanna commented Oct 13, 2022

We discussed this with the team. We said that while the optimization works well for @timestamp, it relies on how the data is indexed in the shards, hence we would not see this work well for any numeric field. Also, like mentioned above we would not want to allow any custom index metadata added to the cluster state. Moreover, the optimization is currently only enabled for read-only indices and expanding it to write indices would be complex (how would you update the metadata when a shard gets a new document written that updates the range?).

More importantly, there is common agreement that it should not be required to skip shards on the coordinating node at all times. It's a nice optimization when querying by timestamp, but the can match phase that happens on the shards is supposed to be fast and efficient. Hitting a shard on the frozen tier is not necessarily a bug, and we should investigate deeper when this causes problems to see what the real issue is. That said, we don't see value in adding this feature, as its cost would outweigh the benefits. We will reconsider if we collect evidence that outlines why hitting shards on the frozen tier should be avoided at all cost.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants