Skip to content

Conversation

@peteralfonsi
Copy link
Contributor

Description

This PR automatically rewrites boolean must clauses to filter clauses when that clause would always return the same score. This happens for range/geo-bounding-box queries and match/term/terms queries on numeric fields.

This can cause a significant speedup in some cases, because Lucene may be able to use ImpactsDISI + MaxScoreCache to skip large amounts of docs with non-competitive scores. It can only do this if there is exactly 1 must clause which leads iteration and uses scoring (so, not a numeric query). In cases where this rewrite doesn't enable ImpactsDISI, there should be no perf impact. The time saved on actual scoring is negligible.

Note that it's never possible for this change to take a query which originally used ImpactsDISI and change it so that it doesn't use it after being rewritten. This is because only numeric clauses are moved, and numeric queries aren't eligible for ImpactsDISI in the first place, so they couldn't have been causing speedups from being in a must clause rather than a filter clause.

This PR was blocked for a while by apache/lucene#14542 but now that Lucene 10.2 is in OpenSearch it should be safe to move these clauses around without hurting performance.

Here are some benchmark results from http_logs. After rewriting, the numeric must clause becomes filter so we expect the contender's p50 for the must case to equal both p50s from the filter case. Before each clause I listed whether it was must or filter in the original query.

Original query p50 before changes (ms) p50 after changes (ms)
must "request" matches "images" & must "status" matches "200" 2495 29.5
must "request' matches "images" & filter "status" matches "200" 28.0 28.7
must "request" matches "images" & must "status" matches "404" 75.1 24.3
must "request" matches "images" & filter "status" matches "404" 22.1 22.5
must "request" matches "images" & must "status" matches "500" 9.9 9.6
must "request" matches "images" & filter "status" matches "500" 9.7 9.3
must "request" matches "images" & must "timestamp" from 6/10-6/13 1342 32.5
must "request" matches "images" & filter "timestamp" from 6/10-6/13 31.3 30.0
must "request" matches "images" & must "timestamp" from 6/10 00:00 - 6/10 00:01 11.4 7.5
must "request" matches "images" & filter "timestamp" from 6/10 00:00 - 6/10 00:01 7.0 9.3
must "request" matches "images" & must "timestamp" from 6/1-6/13 1428 23.2
must "request" matches "images" & filter "timestamp" from 6/1-6/13 24.6 24.7

Related Issues

Part of #17586

Check List

  • Functionality includes testing.
  • [N/A] API changes companion pull request created, if applicable.
  • [N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Peter Alfonsi added 2 commits June 11, 2025 12:37
Signed-off-by: Peter Alfonsi <[email protected]>
Peter Alfonsi and others added 3 commits June 17, 2025 10:38
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 216c461: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi peteralfonsi marked this pull request as draft June 17, 2025 18:28
Peter Alfonsi added 2 commits June 18, 2025 11:47
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
@peteralfonsi
Copy link
Contributor Author

Flaky test: #14509

Signed-off-by: Peter Alfonsi <[email protected]>
@github-actions
Copy link
Contributor

✅ Gradle check result for 049c59f: SUCCESS

@codecov
Copy link

codecov bot commented Jun 18, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 72.82%. Comparing base (5ced78c) to head (8e9d383).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...a/org/opensearch/index/query/BoolQueryBuilder.java 83.33% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18541      +/-   ##
============================================
- Coverage     72.83%   72.82%   -0.02%     
- Complexity    68450    68460      +10     
============================================
  Files          5563     5563              
  Lines        314144   314164      +20     
  Branches      45544    45554      +10     
============================================
- Hits         228810   228776      -34     
- Misses        66807    66851      +44     
- Partials      18527    18537      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Jun 26, 2025

@peteralfonsi I like this rewrite strategy especially for range query on numeric fields.
If I understand the intention right, we are also including term query in must for rewrites? Aren't scoring relevant there?

@peteralfonsi
Copy link
Contributor Author

We're only rewriting it for term queries on numeric field, which as I understand it should return a constant score, right?

@msfroh
Copy link
Contributor

msfroh commented Jun 26, 2025

We're only rewriting it for term queries on numeric field, which as I understand it should return a constant score, right?

Correct!

Under the hood, the transformation is term query -> "exact" query (via <Type>Point.newExactQuery or SortedNumericDocValuesField.newSlowExactQuery) -> range query (with upper == lower and includeLower == includeUpper == true). In other words, term queries on numeric fields are range queries anyway.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Jun 26, 2025

ah, I missed that. I think this change makes sense. I will take another look

@github-actions
Copy link
Contributor

✅ Gradle check result for 8e9d383: SUCCESS

@rishabhmaurya
Copy link
Contributor

I think we need to revisit the query rewrite refactoring independent of this change as it is getting a bit complicated over the time.

@rishabhmaurya rishabhmaurya merged commit 0e2ec6f into opensearch-project:main Jun 27, 2025
31 checks passed
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
…search-project#18541)

* Add must -> filter boolean rewrite

Signed-off-by: Peter Alfonsi <[email protected]>

* fix comment wording

Signed-off-by: Peter Alfonsi <[email protected]>

* changelog

Signed-off-by: Peter Alfonsi <[email protected]>

* changelog fix

Signed-off-by: Peter Alfonsi <[email protected]>

* fix WrapperQueryBuilderTests

Signed-off-by: Peter Alfonsi <[email protected]>

* fix PercolatorQuerySearchIT

Signed-off-by: Peter Alfonsi <[email protected]>

* rerun gradle check

Signed-off-by: Peter Alfonsi <[email protected]>

* rerun gradle check

Signed-off-by: Peter Alfonsi <[email protected]>

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 18, 2025
…rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>
rishabhmaurya pushed a commit that referenced this pull request Aug 27, 2025
* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR #18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs #17655 and #18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
pranikum pushed a commit to pranikum/OpenSearch that referenced this pull request Sep 4, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
asimmahmood1 pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 23, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants