-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add query rewriting infrastructure to reduce query complexity #19060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]>
|
❌ Gradle check result for f58ba21: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Atri Sharma <[email protected]>
|
❌ Gradle check result for 478e22a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Atri Sharma <[email protected]>
|
❌ Gradle check result for 847ef04: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Atri Sharma <[email protected]>
|
❌ Gradle check result for 2cfefbe: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]>
opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]>
) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve #19266 by fixing double-negation under MUST_NOT introduced by #19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…nsearch-project#19273) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…nsearch-project#19273) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…nsearch-project#19273) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> Signed-off-by: Ankit Jain <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…nsearch-project#19273) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…nsearch-project#19273) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <[email protected]> * Fix forbidden api issues Signed-off-by: Atri Sharma <[email protected]> * Update writers and get tests to pass Signed-off-by: Atri Sharma <[email protected]> * Update per CI Signed-off-by: Atri Sharma <[email protected]> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <[email protected]> * Expose setting and update per comments Signed-off-by: Atri Sharma <[email protected]> * Update CHANGELOG Signed-off-by: Atri Sharma <[email protected]> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <[email protected]> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <[email protected]> * Handle fields with missing fields Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
…nsearch-project#19273) * Fix regression: preserve not(not X) in BooleanFlatteningRewriter Resolve opensearch-project#19266 by fixing double-negation under MUST_NOT introduced by opensearch-project#19060 When parent is must_not and nested bool has only must_not clauses, rewrite to a positive OR: not(bool(must_not:[X1..Xn])) => filter(bool(should:[X1..Xn], minimum_should_match=1)) Use FILTER (not MUST) to preserve non-scoring semantics of must_not Leave other flattenings unchanged; only trigger on pure must_not nested bool Add unit test: testDoubleNegationConvertedToPositiveMustShould Signed-off-by: Atri Sharma <[email protected]> * spotless output Signed-off-by: Atri Sharma <[email protected]> * Remove must_not rewrite Signed-off-by: Atri Sharma <[email protected]> * Query rewriting tests: expand coverage; enforce idempotence and non-flattening invariants This change broadens the test suite for query rewriting to codify semantics, guard against regressions, and address maintainer feedback around non‑idempotent must_not handling. BooleanFlatteningRewriterTests: Add explicit “don’t flatten” cases when nested bool has non-default boost, has queryName, or nested should has minimum_should_match. Add idempotence check (second rewrite is a no-op in structure/serialization). MatchAllRemovalRewriterTests: Remove match_all from must when a non-scoring context (filter) is present. Remove deeply nested match_all under filter-in-filter. MustToFilterRewriterTests: Move boosted numeric term/range queries to filter; ensure text queries remain scoring. With null QueryShardContext, move ranges but do not move numeric term/terms. MustNotToShouldRewriterTests: Add idempotence check. Ensure no rewrite for multi-valued numeric fields (complements must_not disabled). TermsMergingRewriterTests: Should-clause merging above threshold. Per-field merging: only coalesce within the same field; mixed fields remain separate. QueryRewriterRegistryTests: Dynamic setting update for terms merging threshold (e.g., 16 → 32) changes behavior as expected. Registry idempotence: applying rewrite twice yields identical structure. Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
Implements three query optimizations that work together:
Key features:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.