-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Apply boolean must_not rewrite to numeric match, term, and terms queries #18498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply boolean must_not rewrite to numeric match, term, and terms queries #18498
Conversation
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
|
❌ Gradle check result for 9a036fa: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18498 +/- ##
============================================
- Coverage 72.79% 72.73% -0.07%
+ Complexity 68525 68505 -20
============================================
Files 5574 5567 -7
Lines 314807 314565 -242
Branches 45675 45645 -30
============================================
- Hits 229178 228811 -367
- Misses 67046 67190 +144
+ Partials 18583 18564 -19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This PR is stalled because it has been open for 30 days with no activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in getting back to this, @peteralfonsi. It looks pretty good!
I just had some comments on code style, but the logic looks good to me.
server/src/main/java/org/opensearch/index/query/ComplementAwareQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/query/BoolQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/query/RangeQueryBuilder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
|
❕ Gradle check result for d227d86: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
|
The flaky test was ClientYamlTestSuiteIT search.aggregation/20_terms. I don't think this can be related to these changes since the terms aggregation shouldn't have anything to do with the terms query builder, and it passes locally, but I'll rerun the gradle check to be safe. |
Signed-off-by: Peter Alfonsi <[email protected]>
|
❕ Gradle check result for 910cc9f: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
|
Flaky test: #14407 (not the same flaky test as before, that one should be ok) |
…ies (opensearch-project#18498) --------- Signed-off-by: Peter Alfonsi <[email protected]> Signed-off-by: Peter Alfonsi <[email protected]> Co-authored-by: Peter Alfonsi <[email protected]>
…rewriting infrastructure
This commit migrates two existing query optimizations from BoolQueryBuilder to the new
query rewriting infrastructure:
1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)
2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)
Changes:
Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
Register both rewriters in QueryRewriterRegistry
Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
Disable legacy implementations in BoolQueryBuilder
Comment out BoolQueryBuilder tests that relied on the old implementations
The new rewriters maintain full backward compatibility while providing:
Better separation of concerns
Recursive rewriting for nested boolean queries
Proper error handling and logging
Consistent priority based execution order
Signed-off-by: Atri Sharma <[email protected]>
* Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR #18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs #17655 and #18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…ies (opensearch-project#18498) --------- Signed-off-by: Peter Alfonsi <[email protected]> Signed-off-by: Peter Alfonsi <[email protected]> Co-authored-by: Peter Alfonsi <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
Description
Followup to #17655, where we rewrote range queries in boolean must_not clauses to instead be should clauses containing the complement of the original query. This PR extends the rewrite to match, term, and terms queries on numeric fields.
The speedups here seem larger than for range queries, plus I imagine must_not of numeric terms might be more common than on ranges. (Imagine excluding all documents with HTTP status 200 for example). So hopefully this PR will be more useful than the first one.
Some benchmark numbers from http_logs are below. These were run on 1-node clusters using tar installs in c5.2xl ec2 instances. "Originally written as" means whether the query was sent to OpenSearch with a must_not clause, and if so, whether it was a match or term query, or if the query was sent already rewritten with should clauses. Ideally, after the changes are applied, these p50s should be the same, because the
must_nots are internally rewritten to beshouldofranges. Note 200 is the most common value, and 404 and 500 are rarer.must_notofmatchmust_notoftermshouldofrangesmust_notofmatchmust_notoftermshouldofrangesmust_notofmatchmust_notoftermshouldofrangesmust_notoftermsshouldofrangesmust_notoftermsshouldofrangesRelated Issues
Part of #17586
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.