Skip to content

Conversation

@peteralfonsi
Copy link
Contributor

Description

Followup to #17655, where we rewrote range queries in boolean must_not clauses to instead be should clauses containing the complement of the original query. This PR extends the rewrite to match, term, and terms queries on numeric fields.

The speedups here seem larger than for range queries, plus I imagine must_not of numeric terms might be more common than on ranges. (Imagine excluding all documents with HTTP status 200 for example). So hopefully this PR will be more useful than the first one.

Some benchmark numbers from http_logs are below. These were run on 1-node clusters using tar installs in c5.2xl ec2 instances. "Originally written as" means whether the query was sent to OpenSearch with a must_not clause, and if so, whether it was a match or term query, or if the query was sent already rewritten with should clauses. Ideally, after the changes are applied, these p50s should be the same, because the must_nots are internally rewritten to be should of ranges. Note 200 is the most common value, and 404 and 500 are rarer.

Excluded status value(s) Originally written as p50 before changes (ms) p50 after changes (ms) Speedup as fraction of original
200 must_not of match 1021 20.72 49x
200 must_not of term 1011 18.83 54x
200 should of ranges 20.71 18.10 -
404 must_not of match 515.6 12.03 43x
404 must_not of term 513.0 12.01 43x
404 should of ranges 13.14 11.74 -
500 must_not of match 481.6 9.49 44x
500 must_not of term 487.3 9.53 51x
500 should of ranges 10.26 9.05 -
200, 500 must_not of terms 958.6 17.76 54x
200, 500 should of ranges 19.87 17.84 -
404, 500 must_not of terms 508.1 11.75 43x
404, 500 should of ranges 12.88 11.28 -

Related Issues

Part of #17586

Check List

  • Functionality includes testing.
  • [N/A] API changes companion pull request created, if applicable.
  • [N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Peter Alfonsi added 7 commits June 6, 2025 10:46
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 9a036fa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@codecov
Copy link

codecov bot commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 91.42857% with 6 lines in your changes missing coverage. Please review.

Project coverage is 72.73%. Comparing base (8f69dcf) to head (910cc9f).
Report is 27 commits behind head on main.

Files with missing lines Patch % Lines
.../org/opensearch/index/query/TermsQueryBuilder.java 78.57% 0 Missing and 3 partials ⚠️
...a/org/opensearch/index/query/BoolQueryBuilder.java 81.81% 0 Missing and 2 partials ⚠️
.../opensearch/index/query/ComplementHelperUtils.java 97.22% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18498      +/-   ##
============================================
- Coverage     72.79%   72.73%   -0.07%     
+ Complexity    68525    68505      -20     
============================================
  Files          5574     5567       -7     
  Lines        314807   314565     -242     
  Branches      45675    45645      -30     
============================================
- Hits         229178   228811     -367     
- Misses        67046    67190     +144     
+ Partials      18583    18564      -19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jul 21, 2025
Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to this, @peteralfonsi. It looks pretty good!

I just had some comments on code style, but the logic looks good to me.

Peter Alfonsi added 2 commits July 22, 2025 12:26
@github-actions
Copy link
Contributor

❕ Gradle check result for d227d86: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@peteralfonsi
Copy link
Contributor Author

The flaky test was ClientYamlTestSuiteIT search.aggregation/20_terms. I don't think this can be related to these changes since the terms aggregation shouldn't have anything to do with the terms query builder, and it passes locally, but I'll rerun the gradle check to be safe.

Signed-off-by: Peter Alfonsi <[email protected]>
@github-actions
Copy link
Contributor

❕ Gradle check result for 910cc9f: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@peteralfonsi
Copy link
Contributor Author

Flaky test: #14407 (not the same flaky test as before, that one should be ok)

@msfroh msfroh merged commit dfed864 into opensearch-project:main Jul 22, 2025
31 checks passed
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
…ies (opensearch-project#18498)

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 18, 2025
…rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>
rishabhmaurya pushed a commit that referenced this pull request Aug 27, 2025
* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR #18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs #17655 and #18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
pranikum pushed a commit to pranikum/OpenSearch that referenced this pull request Sep 4, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
asimmahmood1 pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 23, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…ies (opensearch-project#18498)

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stalled Issues that have stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants