Skip to content

Conversation

@RyanL1997
Copy link
Collaborator

@RyanL1997 RyanL1997 commented Nov 21, 2025

Description

Implemented a filter accumulation mechanism that combines multiple filter conditions into a single Filter RelNode before Calcite's optimization phase begins, preventing the deep Filter chains that trigger combinatorial explosion.

Implementation Summary

The fix introduces automatic detection and accumulation of filtering operations:

  1. Automatic Detection: When a query is analyzed, the system counts all filtering operations ( e.g. regex and where) in the AST. If 2 or more filtering operations are detected, filter accumulation mode is automatically enabled.

  2. Filter Accumulation: Instead of creating individual Filter RelNodes for each regex or where operation, all filter conditions are collected in a list during the analysis phase.

  3. Single Combined Filter: All accumulated conditions are combined with AND operations into a single Filter RelNode, preventing the deep chains that caused memory exhaustion.

  4. Schema-Aware Flushing: Accumulated filters are flushed before any schema-changing operations (like fields) to ensure field references remain valid.

  5. No change needed for command implementations: The fix is completely automatic - no query rewriting or user action required. Queries produce identical results to the original implementation.

How It Works: Before vs After

Before (Memory Explosion)

Query: source=t | regex f1="..." | regex f2="..." | ... | regex f10="..." | fields f1

Analysis Phase:
  Filter(regex10)
  └─ Filter(regex9)
     └─ Filter(regex8)
        └─ ... (deeply nested)
           └─ Scan(t)

Optimization Phase (FilterMergeRule):
  - Tries to merge filters
  - Generates all possible orderings: 10! = 3,628,800 combinations
  - Each combination creates intermediate RelNode objects
  - Memory exhaustion before execution even starts

After (Efficient Single Filter)

Query: source=t | regex f1="..." | regex f2="..." | ... | regex f10="..." | fields f1

Analysis Phase:
  Filter(regex1 AND regex2 AND ... AND regex10)  // Single combined filter
  └─ Scan(t)

Optimization Phase (FilterMergeRule):
  - Only one Filter node to optimize
  - No combinatorial explosion
  - Memory usage remains constant regardless of filter count

Results

  • Memory usage is now constant regardless of the number of regex operations
  • Queries with 10+ regex clauses complete successfully
  • No combinatorial explosion during Calcite optimization
  • Performance improved significantly as Calcite doesn't waste time on filter reordering
  • Solution is automatic - no query rewriting required

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Query engine now consolidates adjacent filters into single combined predicates, producing simpler, more efficient plans.
  • Bug Fixes

    • Eliminated unnecessary nested filters and redundant NULL checks to improve filter pushdown and plan clarity.
  • Tests

    • Added integration tests covering merged-filter scenarios and behaviour with consecutive WHERE/IS NOT NULL/range filters.

✏️ Tip: You can customize this high-level summary in your review settings.

@RyanL1997 RyanL1997 added PPL Piped processing language bugFix labels Nov 21, 2025
@RyanL1997 RyanL1997 changed the title [WIP - DO NOT REVIEW][BugFix] Fix Regex OOM when there are 10+ regex clauses [WIP - DO NOT REVIEW][BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL Nov 21, 2025
int count = 0;

// Count this node if it's a filtering operation
// BUT: Don't count Filter nodes that contain function calls, as they can cause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we still have the same problem with where clauses that have function calls, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire design has changed - right now we apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan

@RyanL1997 RyanL1997 changed the title [WIP - DO NOT REVIEW][BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL [BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL Nov 24, 2025
@RyanL1997 RyanL1997 marked this pull request as ready for review November 24, 2025 12:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)

3318-3357: Consider adding Project, Parse, and Patterns to the stop list for schema-changing operations.

The stop list includes Aggregation, Eval, Window, StreamWindow, but other schema-changing operations like Project (with computed expressions), Parse (adds extracted fields), Patterns (adds pattern fields), Flatten, and Expand are not included.

While these operations have flushFiltersBeforeSchemaChange calls in their visitor methods, the counting logic may still count filters across these boundaries. This could lead to enabling accumulation across schema changes, though the flush calls should still segment them correctly at runtime.

For consistency, consider aligning the stop list with all schema-changing operations:

     if (plan instanceof Aggregation
         || plan instanceof Eval
+        || plan instanceof Parse
+        || plan instanceof Patterns
+        || plan instanceof Flatten
+        || plan instanceof Expand
         || plan instanceof Window
         || plan instanceof StreamWindow) {
       return count;
     }

766-767: Missing flush before schema-changing patterns operation.

visitPatterns adds new fields (pattern, tokens, sample_logs) which constitutes a schema change. While not in the countFilteringOperations stop list, any pending filter conditions should be flushed before this schema modification to prevent RexInputRef mismatches.

   @Override
   public RelNode visitPatterns(Patterns node, CalcitePlanContext context) {
     visitChildren(node, context);
+    flushFiltersBeforeSchemaChange(context);
     RexNode showNumberedTokenExpr = rexVisitor.analyze(node.getShowNumberedToken(), context);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d28c226 and 35c56c5.

📒 Files selected for processing (11)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (3 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (10 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_date_string.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_time_string.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_timestamp_string.yaml (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRegexTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (17)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (2)

65-69: LGTM! Well-structured filter accumulation state management.

The separation of the pending conditions list from the enabled flag is clean. The ArrayList is appropriate here since operations are append-only and order-preserving for the AND combination.


157-170: LGTM! Correct handling of single vs. multiple conditions.

The optimization to avoid unnecessary AND wrapping for single conditions (line 163) is appropriate, and the list is correctly cleared after flushing (line 169).

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (6)

178-193: LGTM! Clean accumulation lifecycle management.

The try-finally pattern ensures the flag is always disabled after processing, and the flush before returning handles accumulated conditions properly.


271-276: LGTM! Correct filter accumulation for non-subquery cases.

Subquery expressions are correctly excluded from accumulation (they require correlation variable handling), while standard filters are accumulated when enabled.


325-330: LGTM! Regex conditions correctly route through accumulation.

This matches the visitFilter pattern, ensuring both regex and where clause filters benefit from accumulation.


2293-2315: LGTM! Proper per-subsearch isolation of accumulation state.

Saving and restoring filterAccumulationEnabled ensures each subsearch independently determines its accumulation needs without cross-branch state leakage. Using analyze() for each subsearch allows independent filter counting.


3328-3335: Confirm this addresses the function-call filter accumulation concern.

The check at line 3332 excludes Filter nodes containing function calls from the count, meaning they won't contribute to enabling accumulation mode. When accumulation IS enabled, visitFilter at line 272-276 will still add such conditions to accumulation if the flag is already enabled.

To fully prevent type mismatches, you may also want to check containsFunctionCall in visitFilter before accumulating:

     } else {
       // Use filter accumulation to prevent deep Filter node chains
-      if (context.isFilterAccumulationEnabled()) {
+      if (context.isFilterAccumulationEnabled() && !containsFunctionCall(node.getCondition())) {
         context.addFilterCondition(condition);
       } else {
         context.relBuilder.filter(condition);
       }
     }

This ensures filters with function calls are never accumulated, regardless of how accumulation was enabled.


1173-1186: Verify flush timing relative to visitChildren in aggregation.

The flush at line 1175 occurs before visitChildren is called (line 1206 in the private method). In other visitor methods (visitProject, visitEval, etc.), the pattern is visitChildrenflushFiltersBeforeSchemaChange. However, since countFilteringOperations stops at Aggregation nodes (lines 3340-3344) and doesn't recurse into children, accumulation mode typically won't be enabled for queries rooted at an Aggregation.

If accumulation could be enabled for queries where Aggregation is not the root (e.g., source=t | stats count() | regex /p/), verify filters accumulated before the Aggregation are flushed at the correct point.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1)

5-6: LGTM! Test expectation correctly updated for filter accumulation.

The change from nested filters to a single LogicalFilter with an AND conjunction correctly reflects the new filter accumulation behavior.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (1)

5-6: LGTM! Filter predicates correctly combined.

The SEARCH and greater-than predicates are now combined into a single AND filter, demonstrating the accumulation mechanism working as intended.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_date_string.yaml (1)

4-6: LGTM! No-pushdown scenario correctly updated.

The date range filters are combined into a single LogicalFilter, and the physical plan appropriately uses EnumerableCalc with a SARG for the combined condition.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRegexTest.java (1)

41-53: Updated chained-regex expectations correctly reflect filter accumulation

The revised expected logical plan (single LogicalFilter with AND(REGEXP_CONTAINS(...), REGEXP_CONTAINS(...))) and Spark SQL (WHERE ... AND ...) align with the new accumulation behavior while preserving semantics of the original two-step filtering. Looks good as a regression test for the consolidation logic.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_time_string.yaml (1)

4-6: Logical plan consolidation to single time-range filter looks correct

Combining the two time-bound predicates into one LogicalFilter(condition=[AND(>($0, TIME(...)), <($0, TIME(...)))]) with the LogicalProject and CalciteLogicalIndexScan directly beneath it matches the new accumulation strategy and keeps the original half-open time range semantics intact.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml (1)

4-6: Date-range filter merge aligns with accumulated-filter semantics

The new single LogicalFilter using AND(>($0, DATE(...)), <($0, DATE(...))) above LogicalProject and CalciteLogicalIndexScan matches the intended consolidation while the physical Sarg range still represents the same exclusive date interval, so the updated expectations look correct.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push.yaml (1)

5-6: Consolidated age/search and balance predicates into one filter correctly

Merging the SEARCH-based age range and >($3, 10000) into a single LogicalFilter(condition=[AND(SEARCH($8, Sarg[(30..40)]), >($3, 10000))]) matches the accumulation design and maintains the original filtering semantics for the no-pushdown plan.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1)

4-6: Single time-range LogicalFilter plus pushdown Sarg match intended behavior

The unified LogicalFilter combining the two TIME comparisons with AND, together with the CalciteEnumerableIndexScan PushDownContext that uses a single Sarg time range, accurately reflects the accumulated-filter and pushdown behavior described in the PR.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_timestamp_string.yaml (1)

5-6: Timestamp-range filters correctly merged into a single LogicalFilter

The updated logical plan’s LogicalFilter(condition=[AND(>($3, TIMESTAMP(...)), <($3, TIMESTAMP(...)))]) above CalciteLogicalIndexScan, together with the Sarg-based timestamp range in the physical plan, is consistent with the new accumulation mechanism and keeps the original exclusive bounds behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)

268-278: Consider caching the HepProgram as a static field.

The HepProgram is stateless and immutable, so it can be safely shared across invocations. This avoids repeated object allocation on every query.

+  private static final HepProgram FILTER_MERGE_PROGRAM =
+      new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();
+
   /**
    * Run Calcite FILTER_MERGE once so adjacent filters created during analysis can collapse before
    * the rest of optimization.
    */
   private RelNode mergeAdjacentFilters(RelNode relNode) {
-    HepProgram program =
-        new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();
-    HepPlanner planner = new HepPlanner(program);
+    HepPlanner planner = new HepPlanner(FILTER_MERGE_PROGRAM);
     planner.setRoot(relNode);
     return planner.findBestExp();
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dfd44b and ad43837.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (3)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (3)

19-27: LGTM!

The imports for Hep-based planning and FilterMergeRule are appropriate for the targeted filter merging approach.


106-108: LGTM!

Correctly positioned after analysis and before optimization to ensure adjacent filters are collapsed early, preventing the combinatorial explosion in the main optimization phase.


152-154: LGTM!

Consistent with the execution path, ensuring explain output accurately reflects the merged filter optimization.

Signed-off-by: Jialiang Liang <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (1)

113-120: mergeAdjacentFilters correctly encapsulates a dedicated HepPlanner; optional reuse of HepProgram

The helper’s use of a HepProgram with just FilterMergeRule.Config.DEFAULT.toRule() and planner.findBestExp() is a standard, safe way to merge adjacent filters, and is suitable for tests. If getRelNode is invoked very frequently in larger suites, you could optionally cache the HepProgram (or even the HepPlanner) as a static field to avoid repeatedly rebuilding it, but this is a minor optimization for test-only code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad43837 and ae78fda.

📒 Files selected for processing (2)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (2 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTrendlineTest.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTrendlineTest.java (1)

79-80: Combined filter expectations correctly reflect new accumulation behavior

Collapsing the two IS NOT NULL checks into a single LogicalFilter with AND(IS NOT NULL($5), IS NOT NULL($7)) and mirroring that via a single WHERE SAL IS NOT NULL AND DEPTNO IS NOT NULL in the Spark SQL keeps the test semantically equivalent to the prior nested filters while aligning with the new adjacent-filter consolidation in the planner.

Also applies to: 92-92

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (2)

25-31: HepPlanner / FilterMergeRule imports are appropriate and correctly scoped

The added Calcite hep and FilterMergeRule imports are all used by mergeAdjacentFilters and confined to test code, which keeps the production surface unchanged.


102-111: getRelNode now returns a filter-merged plan; confirm alignment with production pipeline

Running mergeAdjacentFilters(root) here ensures all tests see a plan with merged adjacent filters, matching the intended planner behavior; this looks correct. Please double-check that the production QueryService (or equivalent entrypoint) applies the same filter-merge stage so tests stay in sync if that pipeline changes.

@RyanL1997
Copy link
Collaborator Author

RyanL1997 commented Dec 1, 2025

Transferring some of the communication with @penghuo here:

First, I agree that the previous implementation infected too much of the original calcite relnode visitor class logic, by applying

...
    flushFiltersBeforeSchemaChange(context);
...

to plethora of visit() logic, which is not convenient for future development of PPL commands with this fix.

Second, I tried the both @penghuo's suggestions

Apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan
Customized a RelNode visitor and apply FILTER_MERGE rule.

with 3dfd44b and ad43837 and both of them works.

According to the above, I selected the approach of applying Calcite's FilterMergeRule directly in QueryService.java (using HepPlanner) for the following reasons:

  • Tree construction happens in CalciteRelNodeVisitor, while optimization is applied as a post-processing step in QueryService
  • Leverages the existing FilterMergeRule.Config.DEFAULT.toRule() instead of custom visitor logic
  • The filter merge happens in the same place for both production execution (executeWithCalcite()) and explain queries (explainWithCalcite())

Signed-off-by: Jialiang Liang <[email protected]>
new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();
HepPlanner planner = new HepPlanner(program);
planner.setRoot(relNode);
return planner.findBestExp();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure performance impact, did u verify?

Copy link
Collaborator Author

@RyanL1997 RyanL1997 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just scripted a mini benchmark break down by directly leverage the clickbench IT queries. The following report shows the detailed performance of each planning phase - in summary, performance testing shows filter merge adds only 0.19ms average overhead (10% of planning time, <1% of total query time).

> python3 analyze_performance.py

Analyzing log file: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/integ-test/build/testclusters/integTest-0/logs/integTest.log
Using test log for query names: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_results.log

================================================================================
FILTER MERGE PERFORMANCE ANALYSIS
================================================================================

📊 OVERALL STATISTICS (168 queries)
--------------------------------------------------------------------------------
Filter Merge Time:
  Mean:          186 μs  (  0.19 ms)
  Median:        103 μs  (  0.10 ms)
  Std Dev:       197 μs
  Min:            41 μs  (  0.04 ms)
  Max:          1541 μs  (  1.54 ms)

Total Planning Time:
  Mean:         1870 μs  (  1.87 ms)
  Median:       1750 μs  (  1.75 ms)

Filter Merge as % of Planning:
  Mean:       9.87%
  Median:     6.22%
  Max:       47.52%

================================================================================
📈 PERFORMANCE ASSESSMENT
--------------------------------------------------------------------------------
Average overhead: 0.19ms (9.9% of planning)
Recommendation: No optimization needed. Merge immediately.

================================================================================
📊 PERCENTILE ANALYSIS
--------------------------------------------------------------------------------
Filter Merge Time Percentiles:
  p50:    105 μs  (  0.10 ms)
  p95:    477 μs  (  0.48 ms)
  p99:   1541 μs  (  1.54 ms)

================================================================================
⏱️  PLANNING PHASE BREAKDOWN
--------------------------------------------------------------------------------
Phase Averages:
  Analyze:          1672 μs  ( 89.4%)
  Filter Merge:      186 μs  ( 10.0%)  ← THIS IS WHAT WE ADDED
  Optimize:            9 μs  (  0.5%)
  Convert:             0 μs  (  0.0%)
  TOTAL:            1870 μs  (100.0%)

================================================================================
🐢 TOP 10 SLOWEST FILTER MERGE TIMES
--------------------------------------------------------------------------------
Rank   Query                Avg Merge Time       Max Merge Time       % of Planning
--------------------------------------------------------------------------------
1      Query46                  1541 μs ( 1.54ms)       1541 μs ( 1.54ms)    47.5%
2      Query29                   543 μs ( 0.54ms)        543 μs ( 0.54ms)    25.5%
3      Query24                   529 μs ( 0.53ms)        529 μs ( 0.53ms)    24.5%
4      Query54                   513 μs ( 0.51ms)        513 μs ( 0.51ms)    18.8%
5      Query44                   477 μs ( 0.48ms)        477 μs ( 0.48ms)    16.1%
6      Query23                   445 μs ( 0.45ms)        445 μs ( 0.45ms)    22.9%
7      Query15                   390 μs ( 0.39ms)        390 μs ( 0.39ms)    19.9%
8      Query71                   388 μs ( 0.39ms)        388 μs ( 0.39ms)    20.4%
9      Query16                   377 μs ( 0.38ms)        377 μs ( 0.38ms)    17.8%
10     Query55                   351 μs ( 0.35ms)        351 μs ( 0.35ms)    18.9%

================================================================================
📈 DISTRIBUTION ANALYSIS
--------------------------------------------------------------------------------
Filter Merge Time Distribution:
  <100μs                 82 ( 48.8%) ████████████████████████
  100-500μs              78 ( 46.4%) ███████████████████████
  500-1000μs (1ms)        6 (  3.6%) █
  1-5ms                   2 (  1.2%)
  5-10ms                  0 (  0.0%)
  >10ms                   0 (  0.0%)

================================================================================
📄 Detailed CSV exported to: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_analysis.csv
================================================================================

Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
penghuo
penghuo previously approved these changes Dec 2, 2025
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. 2 concerns regarding to performance impact. We should monitor the Big5 benchmark closely as next step.

  1. Introduce HepPlanner rules.
  2. After filter_merge, exists filters were added. Because missing_bucket=false, these filters are redundant.

CalciteLogicalIndexScan(table=[[OpenSearch, big5]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, big5]], PushDownContext=[[PROJECT->[process.name, cloud.region, @timestamp, aws.cloudwatch.log_stream], FILTER->SEARCH($2, Sarg[['2023-01-02 00:00:00':VARCHAR..'2023-01-02 10:00:00':VARCHAR)]:VARCHAR), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1, 2},count()=COUNT()), PROJECT->[count(), process.name, cloud.region, aws.cloudwatch.log_stream], SORT->[1 DESC LAST, 2 ASC FIRST, 3 ASC FIRST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"range":{"@timestamp":{"from":"2023-01-02T00:00:00.000Z","to":"2023-01-02T10:00:00.000Z","include_lower":true,"include_upper":false,"format":"date_time","boost":1.0}}},"_source":{"includes":["process.name","cloud.region","@timestamp","aws.cloudwatch.log_stream"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10,"sources":[{"process.name":{"terms":{"field":"process.name","missing_bucket":false,"order":"desc"}}},{"cloud.region":{"terms":{"field":"cloud.region","missing_bucket":false,"order":"asc"}}},{"aws.cloudwatch.log_stream":{"terms":{"field":"aws.cloudwatch.log_stream","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, big5]], PushDownContext=[[PROJECT->[process.name, cloud.region, @timestamp, aws.cloudwatch.log_stream], FILTER->AND(SEARCH($2, Sarg[['2023-01-02 00:00:00':VARCHAR..'2023-01-02 10:00:00':VARCHAR)]:VARCHAR), IS NOT NULL($0), IS NOT NULL($1), IS NOT NULL($3)), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1, 2},count()=COUNT()), PROJECT->[count(), process.name, cloud.region, aws.cloudwatch.log_stream], SORT->[1 DESC LAST, 2 ASC FIRST, 3 ASC FIRST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"range":{"@timestamp":{"from":"2023-01-02T00:00:00.000Z","to":"2023-01-02T10:00:00.000Z","include_lower":true,"include_upper":false,"format":"date_time","boost":1.0}}},{"exists":{"field":"process.name","boost":1.0}},{"exists":{"field":"cloud.region","boost":1.0}},{"exists":{"field":"aws.cloudwatch.log_stream","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["process.name","cloud.region","@timestamp","aws.cloudwatch.log_stream"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10,"sources":[{"process.name":{"terms":{"field":"process.name","missing_bucket":false,"order":"desc"}}},{"cloud.region":{"terms":{"field":"cloud.region","missing_bucket":false,"order":"asc"}}},{"aws.cloudwatch.log_stream":{"terms":{"field":"aws.cloudwatch.log_stream","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After filter_merge, three exists filters were added. Because missing_bucket=false, these filters are redundant.

{"exists":{"field":"process.name","boost":1.0}},{"exists":{"field":"cloud.region","boost":1.0}},{"exists":{"field":"aws.cloudwatch.log_stream","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}

I did not expect any performance regression from this change, but we should monitor the Big5 benchmark closely.

Signed-off-by: Jialiang Liang <[email protected]>
@penghuo penghuo enabled auto-merge (squash) December 2, 2025 21:08
@penghuo penghuo merged commit 52fe8aa into opensearch-project:main Dec 2, 2025
38 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 2, 2025
…PL (#4841)

* [BugFix] Fix Regex OOM when there are 10+ regex clauses

Signed-off-by: Jialiang Liang <[email protected]>

* fix unit tests

Signed-off-by: Jialiang Liang <[email protected]>

* fix tests

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain tests and corresponding commands

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain tests for testFilterPushDownExplain

Signed-off-by: Jialiang Liang <[email protected]>

* peng - isolate the fix logic to its own visitor class

Signed-off-by: Jialiang Liang <[email protected]>

* Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan

Co-authored-by: Peng Huo <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>

* fix the UTs

Signed-off-by: Jialiang Liang <[email protected]>

* fix the ITs after rebase

Signed-off-by: Jialiang Liang <[email protected]>

* fix clickbench IT and more ITs

Signed-off-by: Jialiang Liang <[email protected]>

* address comments from peng

Signed-off-by: Jialiang Liang <[email protected]>

* add yaml test

Signed-off-by: Jialiang Liang <[email protected]>

---------

Signed-off-by: Jialiang Liang <[email protected]>
Co-authored-by: Peng Huo <[email protected]>
(cherry picked from commit 52fe8aa)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
RyanL1997 pushed a commit that referenced this pull request Dec 2, 2025
…PL (#4841) (#4895)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Peng Huo <[email protected]>
@coderabbitai coderabbitai bot mentioned this pull request Dec 9, 2025
8 tasks
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
…PL (opensearch-project#4841)

* [BugFix] Fix Regex OOM when there are 10+ regex clauses

Signed-off-by: Jialiang Liang <[email protected]>

* fix unit tests

Signed-off-by: Jialiang Liang <[email protected]>

* fix tests

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain tests and corresponding commands

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain tests for testFilterPushDownExplain

Signed-off-by: Jialiang Liang <[email protected]>

* peng - isolate the fix logic to its own visitor class

Signed-off-by: Jialiang Liang <[email protected]>

* Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan

Co-authored-by: Peng Huo <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>

* fix the UTs

Signed-off-by: Jialiang Liang <[email protected]>

* fix the ITs after rebase

Signed-off-by: Jialiang Liang <[email protected]>

* fix clickbench IT and more ITs

Signed-off-by: Jialiang Liang <[email protected]>

* address comments from peng

Signed-off-by: Jialiang Liang <[email protected]>

* add yaml test

Signed-off-by: Jialiang Liang <[email protected]>

---------

Signed-off-by: Jialiang Liang <[email protected]>
Co-authored-by: Peng Huo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants