Skip to content

Conversation

@atris
Copy link
Contributor

@atris atris commented Aug 13, 2025

Implements three query optimizations that work together:

  • Boolean flattening: removes unnecessary nested boolean queries
  • Terms merging: combines multiple term queries on same field in filter/should contexts
  • Match-all removal: eliminates redundant match_all queries

Key features:

  • 60-70% reduction in query nodes for typical filtered queries
  • Feature flag: search.query_rewriting.enabled (default: true)
  • Preserves exact query semantics and results

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for f58ba21: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 478e22a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 847ef04: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 2cfefbe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for d161640: SUCCESS

@rishabhmaurya rishabhmaurya merged commit 8531924 into opensearch-project:main Aug 27, 2025
31 checks passed
atris added a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
pranikum pushed a commit to pranikum/OpenSearch that referenced this pull request Sep 4, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Sep 11, 2025
   Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060
   When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Sep 11, 2025
 opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>
andrross pushed a commit that referenced this pull request Sep 17, 2025
)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve #19266 by fixing double-negation under MUST_NOT introduced by #19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…nsearch-project#19273)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…nsearch-project#19273)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…nsearch-project#19273)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Ankit Jain <[email protected]>
asimmahmood1 pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 23, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
asimmahmood1 pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 23, 2025
…nsearch-project#19273)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
pranikum pushed a commit to pranikum/OpenSearch that referenced this pull request Sep 23, 2025
…nsearch-project#19273)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <[email protected]>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <[email protected]>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <[email protected]>

* Update per CI

Signed-off-by: Atri Sharma <[email protected]>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <[email protected]>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <[email protected]>

* Update CHANGELOG

Signed-off-by: Atri Sharma <[email protected]>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <[email protected]>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <[email protected]>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…nsearch-project#19273)

* Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR:

   not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1))
   Use FILTER (not MUST) to preserve non-scoring semantics of must_not
   Leave other flattenings unchanged; only trigger on pure must_not nested bool
   Add unit test: testDoubleNegationConvertedToPositiveMustShould

Signed-off-by: Atri Sharma <[email protected]>

* spotless output

Signed-off-by: Atri Sharma <[email protected]>

* Remove must_not rewrite

Signed-off-by: Atri Sharma <[email protected]>

* Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants

This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling.
BooleanFlatteningRewriterTests:
	Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match.
	Add idempotence check (second rewrite is a no-op in structure/serialization).
MatchAllRemovalRewriterTests:
	Remove match_all from must when a non-scoring context (filter) is present.
	Remove deeply nested match_all under filter-in-filter.
MustToFilterRewriterTests:
	Move boosted numeric term/range queries to filter; ensure text queries remain scoring.
	With null QueryShardContext, move ranges but do not move numeric term/terms.
MustNotToShouldRewriterTests:
	Add idempotence check.
	Ensure no rewrite for multi-valued numeric fields (complements must_not disabled).
TermsMergingRewriterTests:
	Should-clause merging above threshold.
	Per-field merging: only coalesce within the same field; mixed fields remain separate.
QueryRewriterRegistryTests:
	Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected.
	Registry idempotence: applying rewrite twice yields identical structure.

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants