Alternate approach to speed up ignored_source access#133839
Alternate approach to speed up ignored_source access#133839jordan-powers merged 6 commits intoelastic:mainfrom
Conversation
|
Here are the results of running the INSIST_🐔 benchmark comparing against the current main (with the feature flag enabled, splitting up Summary:
Looks like the new approach is comparable for limit_500 chicken_3, and chicken_3_with_where, slightly worse for chicken_1, and much worse for chicken_2 and chicken_4 |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| } | ||
|
|
||
| return permissions.getFieldPermissions().filter(wrappedReader); | ||
| var indexVersionCreated = searchExecutionContextProvider.apply(shardId).indexVersionCreated(); |
There was a problem hiding this comment.
Passing down index version here is a little intrusive. However I don't think it is too bad. In other places of the code base passing down index version is a common pattern.
The alternative we talked about, which would store coalesced ignored source in a different field, would also be an intrusive change.
So I'm okay with this approach.
…d-fieldinfo-explosion
Follow-up to #133839 to remove the feature flag and enable the feature in production.
This patch fixes assertions in the serverless tests invalidated by the changes in elastic#133839.
This patch fixes assertions in the serverless tests invalidated by the changes in elastic#133839.
In #132142 and #132428 we split up ignored_source entries into distinct lucene fields, then added an optimized field visitor to speed up retrieving unmapped values for INSIST_🐔.
However, since this approach creates a unique lucene field for every ignored_source entry, we can very quickly have a lot of lucene fields if there are a lot of unique unmapped fields per document. This can cause significant slowdowns in indexing throughput and merge time.
This PR addresses those limitations by reverting back to keeping all ignored_source entries under the same lucene field. However, we still keep some of the speedups from that prior work by continuing to coalesce multiple ignored_source entries for the same field into a single entry, allowing the field visitor to exit early.
Unfortunately, we do lose some time compared to the original optimizations because now the field visitor cannot look at the fieldInfo to decide whether or not to visit a field, and it instead needs to actually visit and materialize each ignored_source entry before it can decide whether or not to keep it.