-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Boolean must --> filter rewrite for queries with constant scores #18541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boolean must --> filter rewrite for queries with constant scores #18541
Conversation
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
|
❌ Gradle check result for 216c461: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
|
Flaky test: #14509 |
Signed-off-by: Peter Alfonsi <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18541 +/- ##
============================================
- Coverage 72.83% 72.82% -0.02%
- Complexity 68450 68460 +10
============================================
Files 5563 5563
Lines 314144 314164 +20
Branches 45544 45554 +10
============================================
- Hits 228810 228776 -34
- Misses 66807 66851 +44
- Partials 18527 18537 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@peteralfonsi I like this rewrite strategy especially for range query on numeric fields. |
|
We're only rewriting it for term queries on numeric field, which as I understand it should return a constant score, right? |
Correct! Under the hood, the transformation is term query -> "exact" query (via |
|
ah, I missed that. I think this change makes sense. I will take another look |
Signed-off-by: Peter Alfonsi <[email protected]>
|
I think we need to revisit the query rewrite refactoring independent of this change as it is getting a bit complicated over the time. |
…search-project#18541) * Add must -> filter boolean rewrite Signed-off-by: Peter Alfonsi <[email protected]> * fix comment wording Signed-off-by: Peter Alfonsi <[email protected]> * changelog Signed-off-by: Peter Alfonsi <[email protected]> * changelog fix Signed-off-by: Peter Alfonsi <[email protected]> * fix WrapperQueryBuilderTests Signed-off-by: Peter Alfonsi <[email protected]> * fix PercolatorQuerySearchIT Signed-off-by: Peter Alfonsi <[email protected]> * rerun gradle check Signed-off-by: Peter Alfonsi <[email protected]> * rerun gradle check Signed-off-by: Peter Alfonsi <[email protected]> --------- Signed-off-by: Peter Alfonsi <[email protected]> Signed-off-by: Peter Alfonsi <[email protected]> Co-authored-by: Peter Alfonsi <[email protected]>
…rewriting infrastructure
This commit migrates two existing query optimizations from BoolQueryBuilder to the new
query rewriting infrastructure:
1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)
2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)
Changes:
Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
Register both rewriters in QueryRewriterRegistry
Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
Disable legacy implementations in BoolQueryBuilder
Comment out BoolQueryBuilder tests that relied on the old implementations
The new rewriters maintain full backward compatibility while providing:
Better separation of concerns
Recursive rewriting for nested boolean queries
Proper error handling and logging
Consistent priority based execution order
Signed-off-by: Atri Sharma <[email protected]>
* Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR #18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs #17655 and #18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
Description
This PR automatically rewrites boolean
mustclauses tofilterclauses when that clause would always return the same score. This happens for range/geo-bounding-box queries and match/term/terms queries on numeric fields.This can cause a significant speedup in some cases, because Lucene may be able to use
ImpactsDISI+MaxScoreCacheto skip large amounts of docs with non-competitive scores. It can only do this if there is exactly 1 must clause which leads iteration and uses scoring (so, not a numeric query). In cases where this rewrite doesn't enableImpactsDISI, there should be no perf impact. The time saved on actual scoring is negligible.Note that it's never possible for this change to take a query which originally used
ImpactsDISIand change it so that it doesn't use it after being rewritten. This is because only numeric clauses are moved, and numeric queries aren't eligible forImpactsDISIin the first place, so they couldn't have been causing speedups from being in amustclause rather than afilterclause.This PR was blocked for a while by apache/lucene#14542 but now that Lucene 10.2 is in OpenSearch it should be safe to move these clauses around without hurting performance.
Here are some benchmark results from
http_logs. After rewriting, the numericmustclause becomesfilterso we expect the contender's p50 for themustcase to equal both p50s from thefiltercase. Before each clause I listed whether it wasmustorfilterin the original query.must"request" matches "images" &must"status" matches "200"must"request' matches "images" &filter"status" matches "200"must"request" matches "images" &must"status" matches "404"must"request" matches "images" &filter"status" matches "404"must"request" matches "images" &must"status" matches "500"must"request" matches "images" &filter"status" matches "500"must"request" matches "images" &must"timestamp" from 6/10-6/13must"request" matches "images" &filter"timestamp" from 6/10-6/13must"request" matches "images" &must"timestamp" from 6/10 00:00 - 6/10 00:01must"request" matches "images" &filter"timestamp" from 6/10 00:00 - 6/10 00:01must"request" matches "images" &must"timestamp" from 6/1-6/13must"request" matches "images" &filter"timestamp" from 6/1-6/13Related Issues
Part of #17586
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.