[BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL #4841

RyanL1997 · 2025-11-21T04:46:35Z

Description

Implemented a filter accumulation mechanism that combines multiple filter conditions into a single Filter RelNode before Calcite's optimization phase begins, preventing the deep Filter chains that trigger combinatorial explosion.

Implementation Summary

The fix introduces automatic detection and accumulation of filtering operations:

Automatic Detection: When a query is analyzed, the system counts all filtering operations ( e.g. regex and where) in the AST. If 2 or more filtering operations are detected, filter accumulation mode is automatically enabled.
Filter Accumulation: Instead of creating individual Filter RelNodes for each regex or where operation, all filter conditions are collected in a list during the analysis phase.
Single Combined Filter: All accumulated conditions are combined with AND operations into a single Filter RelNode, preventing the deep chains that caused memory exhaustion.
Schema-Aware Flushing: Accumulated filters are flushed before any schema-changing operations (like fields) to ensure field references remain valid.
No change needed for command implementations: The fix is completely automatic - no query rewriting or user action required. Queries produce identical results to the original implementation.

How It Works: Before vs After

Before (Memory Explosion)

Query: source=t | regex f1="..." | regex f2="..." | ... | regex f10="..." | fields f1

Analysis Phase:
  Filter(regex10)
  └─ Filter(regex9)
     └─ Filter(regex8)
        └─ ... (deeply nested)
           └─ Scan(t)

Optimization Phase (FilterMergeRule):
  - Tries to merge filters
  - Generates all possible orderings: 10! = 3,628,800 combinations
  - Each combination creates intermediate RelNode objects
  - Memory exhaustion before execution even starts

After (Efficient Single Filter)

Query: source=t | regex f1="..." | regex f2="..." | ... | regex f10="..." | fields f1

Analysis Phase:
  Filter(regex1 AND regex2 AND ... AND regex10)  // Single combined filter
  └─ Scan(t)

Optimization Phase (FilterMergeRule):
  - Only one Filter node to optimize
  - No combinatorial explosion
  - Memory usage remains constant regardless of filter count

Results

Memory usage is now constant regardless of the number of regex operations
Queries with 10+ regex clauses complete successfully
No combinatorial explosion during Calcite optimization
Performance improved significantly as Calcite doesn't waste time on filter reordering
Solution is automatic - no query rewriting required

Related Issues

Relate [BUG] Memory Exhaustion for Multiple Filtering Operations in PPL #4842 as a quick fix for now

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

New Features
- Query engine now consolidates adjacent filters into single combined predicates, producing simpler, more efficient plans.
Bug Fixes
- Eliminated unnecessary nested filters and redundant NULL checks to improve filter pushdown and plan clarity.
Tests
- Added integration tests covering merged-filter scenarios and behaviour with consecutive WHERE/IS NOT NULL/range filters.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Swiddis · 2025-11-21T16:56:58Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

+    int count = 0;
+
+    // Count this node if it's a filtering operation
+    // BUT: Don't count Filter nodes that contain function calls, as they can cause


This means we still have the same problem with where clauses that have function calls, right?

The entire design has changed - right now we apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan

Signed-off-by: Jialiang Liang <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
3318-3357: Consider adding Project, Parse, and Patterns to the stop list for schema-changing operations.

The stop list includes Aggregation, Eval, Window, StreamWindow, but other schema-changing operations like Project (with computed expressions), Parse (adds extracted fields), Patterns (adds pattern fields), Flatten, and Expand are not included.

While these operations have flushFiltersBeforeSchemaChange calls in their visitor methods, the counting logic may still count filters across these boundaries. This could lead to enabling accumulation across schema changes, though the flush calls should still segment them correctly at runtime.

For consistency, consider aligning the stop list with all schema-changing operations:
     if (plan instanceof Aggregation
         || plan instanceof Eval
+        || plan instanceof Parse
+        || plan instanceof Patterns
+        || plan instanceof Flatten
+        || plan instanceof Expand
         || plan instanceof Window
         || plan instanceof StreamWindow) {
       return count;
     }
766-767: Missing flush before schema-changing patterns operation.

visitPatterns adds new fields (pattern, tokens, sample_logs) which constitutes a schema change. While not in the countFilteringOperations stop list, any pending filter conditions should be flushed before this schema modification to prevent RexInputRef mismatches.
   @Override
   public RelNode visitPatterns(Patterns node, CalcitePlanContext context) {
     visitChildren(node, context);
+    flushFiltersBeforeSchemaChange(context);
     RexNode showNumberedTokenExpr = rexVisitor.analyze(node.getShowNumberedToken(), context);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d28c226 and 35c56c5.

📒 Files selected for processing (11)

core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (3 hunks)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (10 hunks)
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_date_string.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_time_string.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_timestamp_string.yaml (1 hunks)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRegexTest.java (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: security-it-linux (21)
GitHub Check: security-it-linux (25)
GitHub Check: build-linux (25, doc)
GitHub Check: build-linux (21, doc)
GitHub Check: bwc-tests-full-restart (21)
GitHub Check: build-linux (25, unit)
GitHub Check: build-linux (25, integration)
GitHub Check: build-linux (21, unit)
GitHub Check: build-linux (21, integration)
GitHub Check: bwc-tests-rolling-upgrade (21)
GitHub Check: bwc-tests-rolling-upgrade (25)
GitHub Check: bwc-tests-full-restart (25)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 25, unit)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: build-windows-macos (macos-14, 25, doc)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: build-windows-macos (macos-14, 21, doc)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: security-it-windows-macos (windows-latest, 25)
GitHub Check: security-it-windows-macos (windows-latest, 21)
GitHub Check: security-it-windows-macos (macos-14, 21)
GitHub Check: security-it-windows-macos (macos-14, 25)
GitHub Check: CodeQL-Scan (java)

🔇 Additional comments (17)

core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (2)

65-69: LGTM! Well-structured filter accumulation state management.

The separation of the pending conditions list from the enabled flag is clean. The ArrayList is appropriate here since operations are append-only and order-preserving for the AND combination.

157-170: LGTM! Correct handling of single vs. multiple conditions.

The optimization to avoid unnecessary AND wrapping for single conditions (line 163) is appropriate, and the list is correctly cleared after flushing (line 169).
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (6)
178-193: LGTM! Clean accumulation lifecycle management.

The try-finally pattern ensures the flag is always disabled after processing, and the flush before returning handles accumulated conditions properly.

271-276: LGTM! Correct filter accumulation for non-subquery cases.

Subquery expressions are correctly excluded from accumulation (they require correlation variable handling), while standard filters are accumulated when enabled.

325-330: LGTM! Regex conditions correctly route through accumulation.

This matches the visitFilter pattern, ensuring both regex and where clause filters benefit from accumulation.

2293-2315: LGTM! Proper per-subsearch isolation of accumulation state.

Saving and restoring filterAccumulationEnabled ensures each subsearch independently determines its accumulation needs without cross-branch state leakage. Using analyze() for each subsearch allows independent filter counting.

3328-3335: Confirm this addresses the function-call filter accumulation concern.

The check at line 3332 excludes Filter nodes containing function calls from the count, meaning they won't contribute to enabling accumulation mode. When accumulation IS enabled, visitFilter at line 272-276 will still add such conditions to accumulation if the flag is already enabled.

To fully prevent type mismatches, you may also want to check containsFunctionCall in visitFilter before accumulating:
     } else {
       // Use filter accumulation to prevent deep Filter node chains
-      if (context.isFilterAccumulationEnabled()) {
+      if (context.isFilterAccumulationEnabled() && !containsFunctionCall(node.getCondition())) {
         context.addFilterCondition(condition);
       } else {
         context.relBuilder.filter(condition);
       }
     }
This ensures filters with function calls are never accumulated, regardless of how accumulation was enabled.

1173-1186: Verify flush timing relative to visitChildren in aggregation.

The flush at line 1175 occurs before visitChildren is called (line 1206 in the private method). In other visitor methods (visitProject, visitEval, etc.), the pattern is visitChildren → flushFiltersBeforeSchemaChange. However, since countFilteringOperations stops at Aggregation nodes (lines 3340-3344) and doesn't recurse into children, accumulation mode typically won't be enabled for queries rooted at an Aggregation.

If accumulation could be enabled for queries where Aggregation is not the root (e.g., source=t | stats count() | regex /p/), verify filters accumulated before the Aggregation are flushed at the correct point.
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1)

5-6: LGTM! Test expectation correctly updated for filter accumulation.

The change from nested filters to a single LogicalFilter with an AND conjunction correctly reflects the new filter accumulation behavior.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (1)

5-6: LGTM! Filter predicates correctly combined.

The SEARCH and greater-than predicates are now combined into a single AND filter, demonstrating the accumulation mechanism working as intended.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_date_string.yaml (1)

4-6: LGTM! No-pushdown scenario correctly updated.

The date range filters are combined into a single LogicalFilter, and the physical plan appropriately uses EnumerableCalc with a SARG for the combined condition.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRegexTest.java (1)

41-53: Updated chained-regex expectations correctly reflect filter accumulation

The revised expected logical plan (single LogicalFilter with AND(REGEXP_CONTAINS(...), REGEXP_CONTAINS(...))) and Spark SQL (WHERE ... AND ...) align with the new accumulation behavior while preserving semantics of the original two-step filtering. Looks good as a regression test for the consolidation logic.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_time_string.yaml (1)

4-6: Logical plan consolidation to single time-range filter looks correct

Combining the two time-bound predicates into one LogicalFilter(condition=[AND(>($0, TIME(...)), <($0, TIME(...)))]) with the LogicalProject and CalciteLogicalIndexScan directly beneath it matches the new accumulation strategy and keeps the original half-open time range semantics intact.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml (1)

4-6: Date-range filter merge aligns with accumulated-filter semantics

The new single LogicalFilter using AND(>($0, DATE(...)), <($0, DATE(...))) above LogicalProject and CalciteLogicalIndexScan matches the intended consolidation while the physical Sarg range still represents the same exclusive date interval, so the updated expectations look correct.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push.yaml (1)

5-6: Consolidated age/search and balance predicates into one filter correctly

Merging the SEARCH-based age range and >($3, 10000) into a single LogicalFilter(condition=[AND(SEARCH($8, Sarg[(30..40)]), >($3, 10000))]) matches the accumulation design and maintains the original filtering semantics for the no-pushdown plan.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1)

4-6: Single time-range LogicalFilter plus pushdown Sarg match intended behavior

The unified LogicalFilter combining the two TIME comparisons with AND, together with the CalciteEnumerableIndexScan PushDownContext that uses a single Sarg time range, accurately reflects the accumulated-filter and pushdown behavior described in the PR.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_timestamp_string.yaml (1)

5-6: Timestamp-range filters correctly merged into a single LogicalFilter

The updated logical plan’s LogicalFilter(condition=[AND(>($3, TIMESTAMP(...)), <($3, TIMESTAMP(...)))]) above CalciteLogicalIndexScan, together with the Sarg-based timestamp range in the physical plan, is consistent with the new accumulation mechanism and keeps the original exclusive bounds behavior.

Signed-off-by: Jialiang Liang <[email protected]>

Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)

268-278: Consider caching the HepProgram as a static field.

The HepProgram is stateless and immutable, so it can be safely shared across invocations. This avoids repeated object allocation on every query.

+  private static final HepProgram FILTER_MERGE_PROGRAM =
+      new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();
+
   /**
    * Run Calcite FILTER_MERGE once so adjacent filters created during analysis can collapse before
    * the rest of optimization.
    */
   private RelNode mergeAdjacentFilters(RelNode relNode) {
-    HepProgram program =
-        new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();
-    HepPlanner planner = new HepPlanner(program);
+    HepPlanner planner = new HepPlanner(FILTER_MERGE_PROGRAM);
     planner.setRoot(relNode);
     return planner.findBestExp();
   }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dfd44b and ad43837.

📒 Files selected for processing (2)

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)

GitHub Check: build-linux (21, doc)
GitHub Check: build-linux (25, doc)
GitHub Check: build-linux (21, integration)
GitHub Check: bwc-tests-rolling-upgrade (25)
GitHub Check: bwc-tests-rolling-upgrade (21)
GitHub Check: build-linux (25, integration)
GitHub Check: build-linux (21, unit)
GitHub Check: bwc-tests-full-restart (25)
GitHub Check: build-linux (25, unit)
GitHub Check: bwc-tests-full-restart (21)
GitHub Check: security-it-linux (21)
GitHub Check: security-it-linux (25)
GitHub Check: security-it-windows-macos (windows-latest, 25)
GitHub Check: security-it-windows-macos (windows-latest, 21)
GitHub Check: security-it-windows-macos (macos-14, 21)
GitHub Check: security-it-windows-macos (macos-14, 25)
GitHub Check: CodeQL-Scan (java)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: build-windows-macos (macos-14, 21, doc)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (macos-14, 25, doc)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 25, unit)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: WhiteSource Security Check

🔇 Additional comments (3)

core/src/main/java/org/opensearch/sql/executor/QueryService.java (3)

19-27: LGTM!

The imports for Hep-based planning and FilterMergeRule are appropriate for the targeted filter merging approach.

106-108: LGTM!

Correctly positioned after analysis and before optimization to ensure adjacent filters are collapsed early, preventing the combinatorial explosion in the main optimization phase.

152-154: LGTM!

Consistent with the execution path, ensuring explain output accurately reflects the merged filter optimization.

Signed-off-by: Jialiang Liang <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (1)

113-120: mergeAdjacentFilters correctly encapsulates a dedicated HepPlanner; optional reuse of HepProgram

The helper’s use of a HepProgram with just FilterMergeRule.Config.DEFAULT.toRule() and planner.findBestExp() is a standard, safe way to merge adjacent filters, and is suitable for tests. If getRelNode is invoked very frequently in larger suites, you could optionally cache the HepProgram (or even the HepPlanner) as a static field to avoid repeatedly rebuilding it, but this is a minor optimization for test-only code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad43837 and ae78fda.

📒 Files selected for processing (2)

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (2 hunks)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTrendlineTest.java (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: build-linux (25, doc)
GitHub Check: bwc-tests-full-restart (21)
GitHub Check: build-linux (21, doc)
GitHub Check: build-linux (21, unit)
GitHub Check: bwc-tests-rolling-upgrade (21)
GitHub Check: build-linux (21, integration)
GitHub Check: build-linux (25, integration)
GitHub Check: bwc-tests-full-restart (25)
GitHub Check: bwc-tests-rolling-upgrade (25)
GitHub Check: build-linux (25, unit)
GitHub Check: security-it-linux (25)
GitHub Check: security-it-linux (21)
GitHub Check: build-windows-macos (macos-14, 25, doc)
GitHub Check: build-windows-macos (macos-14, 25, unit)
GitHub Check: build-windows-macos (macos-14, 21, doc)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
GitHub Check: security-it-windows-macos (windows-latest, 25)
GitHub Check: security-it-windows-macos (macos-14, 21)
GitHub Check: security-it-windows-macos (windows-latest, 21)
GitHub Check: security-it-windows-macos (macos-14, 25)
GitHub Check: CodeQL-Scan (java)

🔇 Additional comments (3)

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTrendlineTest.java (1)

79-80: Combined filter expectations correctly reflect new accumulation behavior

Collapsing the two IS NOT NULL checks into a single LogicalFilter with AND(IS NOT NULL($5), IS NOT NULL($7)) and mirroring that via a single WHERE SAL IS NOT NULL AND DEPTNO IS NOT NULL in the Spark SQL keeps the test semantically equivalent to the prior nested filters while aligning with the new adjacent-filter consolidation in the planner.

Also applies to: 92-92

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (2)

25-31: HepPlanner / FilterMergeRule imports are appropriate and correctly scoped

The added Calcite hep and FilterMergeRule imports are all used by mergeAdjacentFilters and confined to test code, which keeps the production surface unchanged.

102-111: getRelNode now returns a filter-merged plan; confirm alignment with production pipeline

Running mergeAdjacentFilters(root) here ensures all tests see a plan with merged adjacent filters, matching the intended planner behavior; this looks correct. Please double-check that the production QueryService (or equivalent entrypoint) applies the same filter-merge stage so tests stay in sync if that pipeline changes.

RyanL1997 · 2025-12-01T22:50:47Z

Transferring some of the communication with @penghuo here:

First, I agree that the previous implementation infected too much of the original calcite relnode visitor class logic, by applying

...
    flushFiltersBeforeSchemaChange(context);
...

to plethora of visit() logic, which is not convenient for future development of PPL commands with this fix.

Second, I tried the both @penghuo's suggestions

Apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan
Customized a RelNode visitor and apply FILTER_MERGE rule.

with 3dfd44b and ad43837 and both of them works.

According to the above, I selected the approach of applying Calcite's FilterMergeRule directly in QueryService.java (using HepPlanner) for the following reasons:

Tree construction happens in CalciteRelNodeVisitor, while optimization is applied as a post-processing step in QueryService
Leverages the existing FilterMergeRule.Config.DEFAULT.toRule() instead of custom visitor logic
The filter merge happens in the same place for both production execution (executeWithCalcite()) and explain queries (explainWithCalcite())

Signed-off-by: Jialiang Liang <[email protected]>

core/src/main/java/org/opensearch/sql/executor/QueryService.java

penghuo · 2025-12-01T23:47:30Z

core/src/main/java/org/opensearch/sql/executor/QueryService.java

+        new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();
+    HepPlanner planner = new HepPlanner(program);
+    planner.setRoot(relNode);
+    return planner.findBestExp();


not sure performance impact, did u verify?

I just scripted a mini benchmark break down by directly leverage the clickbench IT queries. The following report shows the detailed performance of each planning phase - in summary, performance testing shows filter merge adds only 0.19ms average overhead (10% of planning time, <1% of total query time).

> python3 analyze_performance.py Analyzing log file: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/integ-test/build/testclusters/integTest-0/logs/integTest.log Using test log for query names: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_results.log ================================================================================ FILTER MERGE PERFORMANCE ANALYSIS ================================================================================ 📊 OVERALL STATISTICS (168 queries) -------------------------------------------------------------------------------- Filter Merge Time: Mean: 186 μs ( 0.19 ms) Median: 103 μs ( 0.10 ms) Std Dev: 197 μs Min: 41 μs ( 0.04 ms) Max: 1541 μs ( 1.54 ms) Total Planning Time: Mean: 1870 μs ( 1.87 ms) Median: 1750 μs ( 1.75 ms) Filter Merge as % of Planning: Mean: 9.87% Median: 6.22% Max: 47.52% ================================================================================ 📈 PERFORMANCE ASSESSMENT -------------------------------------------------------------------------------- Average overhead: 0.19ms (9.9% of planning) Recommendation: No optimization needed. Merge immediately. ================================================================================ 📊 PERCENTILE ANALYSIS -------------------------------------------------------------------------------- Filter Merge Time Percentiles: p50: 105 μs ( 0.10 ms) p95: 477 μs ( 0.48 ms) p99: 1541 μs ( 1.54 ms) ================================================================================ ⏱️ PLANNING PHASE BREAKDOWN -------------------------------------------------------------------------------- Phase Averages: Analyze: 1672 μs ( 89.4%) Filter Merge: 186 μs ( 10.0%) ← THIS IS WHAT WE ADDED Optimize: 9 μs ( 0.5%) Convert: 0 μs ( 0.0%) TOTAL: 1870 μs (100.0%) ================================================================================ 🐢 TOP 10 SLOWEST FILTER MERGE TIMES -------------------------------------------------------------------------------- Rank Query Avg Merge Time Max Merge Time % of Planning -------------------------------------------------------------------------------- 1 Query46 1541 μs ( 1.54ms) 1541 μs ( 1.54ms) 47.5% 2 Query29 543 μs ( 0.54ms) 543 μs ( 0.54ms) 25.5% 3 Query24 529 μs ( 0.53ms) 529 μs ( 0.53ms) 24.5% 4 Query54 513 μs ( 0.51ms) 513 μs ( 0.51ms) 18.8% 5 Query44 477 μs ( 0.48ms) 477 μs ( 0.48ms) 16.1% 6 Query23 445 μs ( 0.45ms) 445 μs ( 0.45ms) 22.9% 7 Query15 390 μs ( 0.39ms) 390 μs ( 0.39ms) 19.9% 8 Query71 388 μs ( 0.39ms) 388 μs ( 0.39ms) 20.4% 9 Query16 377 μs ( 0.38ms) 377 μs ( 0.38ms) 17.8% 10 Query55 351 μs ( 0.35ms) 351 μs ( 0.35ms) 18.9% ================================================================================ 📈 DISTRIBUTION ANALYSIS -------------------------------------------------------------------------------- Filter Merge Time Distribution: <100μs 82 ( 48.8%) ████████████████████████ 100-500μs 78 ( 46.4%) ███████████████████████ 500-1000μs (1ms) 6 ( 3.6%) █ 1-5ms 2 ( 1.2%) 5-10ms 0 ( 0.0%) >10ms 0 ( 0.0%) ================================================================================ 📄 Detailed CSV exported to: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_analysis.csv ================================================================================

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Signed-off-by: Jialiang Liang <[email protected]>

penghuo

Code looks good. 2 concerns regarding to performance impact. We should monitor the Big5 benchmark closely as next step.

Introduce HepPlanner rules.
After filter_merge, exists filters were added. Because missing_bucket=false, these filters are redundant.

penghuo · 2025-12-02T17:49:40Z

integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml

+                CalciteLogicalIndexScan(table=[[OpenSearch, big5]])
  physical: |
-    CalciteEnumerableIndexScan(table=[[OpenSearch, big5]], PushDownContext=[[PROJECT->[process.name, cloud.region, @timestamp, aws.cloudwatch.log_stream], FILTER->SEARCH($2, Sarg[['2023-01-02 00:00:00':VARCHAR..'2023-01-02 10:00:00':VARCHAR)]:VARCHAR), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1, 2},count()=COUNT()), PROJECT->[count(), process.name, cloud.region, aws.cloudwatch.log_stream], SORT->[1 DESC LAST, 2 ASC FIRST, 3 ASC FIRST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"range":{"@timestamp":{"from":"2023-01-02T00:00:00.000Z","to":"2023-01-02T10:00:00.000Z","include_lower":true,"include_upper":false,"format":"date_time","boost":1.0}}},"_source":{"includes":["process.name","cloud.region","@timestamp","aws.cloudwatch.log_stream"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10,"sources":[{"process.name":{"terms":{"field":"process.name","missing_bucket":false,"order":"desc"}}},{"cloud.region":{"terms":{"field":"cloud.region","missing_bucket":false,"order":"asc"}}},{"aws.cloudwatch.log_stream":{"terms":{"field":"aws.cloudwatch.log_stream","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
+    CalciteEnumerableIndexScan(table=[[OpenSearch, big5]], PushDownContext=[[PROJECT->[process.name, cloud.region, @timestamp, aws.cloudwatch.log_stream], FILTER->AND(SEARCH($2, Sarg[['2023-01-02 00:00:00':VARCHAR..'2023-01-02 10:00:00':VARCHAR)]:VARCHAR), IS NOT NULL($0), IS NOT NULL($1), IS NOT NULL($3)), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1, 2},count()=COUNT()), PROJECT->[count(), process.name, cloud.region, aws.cloudwatch.log_stream], SORT->[1 DESC LAST, 2 ASC FIRST, 3 ASC FIRST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"range":{"@timestamp":{"from":"2023-01-02T00:00:00.000Z","to":"2023-01-02T10:00:00.000Z","include_lower":true,"include_upper":false,"format":"date_time","boost":1.0}}},{"exists":{"field":"process.name","boost":1.0}},{"exists":{"field":"cloud.region","boost":1.0}},{"exists":{"field":"aws.cloudwatch.log_stream","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["process.name","cloud.region","@timestamp","aws.cloudwatch.log_stream"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10,"sources":[{"process.name":{"terms":{"field":"process.name","missing_bucket":false,"order":"desc"}}},{"cloud.region":{"terms":{"field":"cloud.region","missing_bucket":false,"order":"asc"}}},{"aws.cloudwatch.log_stream":{"terms":{"field":"aws.cloudwatch.log_stream","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])


After filter_merge, three exists filters were added. Because missing_bucket=false, these filters are redundant.

{"exists":{"field":"process.name","boost":1.0}},{"exists":{"field":"cloud.region","boost":1.0}},{"exists":{"field":"aws.cloudwatch.log_stream","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}

I did not expect any performance regression from this change, but we should monitor the Big5 benchmark closely.

Signed-off-by: Jialiang Liang <[email protected]>

…PL (#4841) * [BugFix] Fix Regex OOM when there are 10+ regex clauses Signed-off-by: Jialiang Liang <[email protected]> * fix unit tests Signed-off-by: Jialiang Liang <[email protected]> * fix tests Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests and corresponding commands Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests for testFilterPushDownExplain Signed-off-by: Jialiang Liang <[email protected]> * peng - isolate the fix logic to its own visitor class Signed-off-by: Jialiang Liang <[email protected]> * Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]> * fix the UTs Signed-off-by: Jialiang Liang <[email protected]> * fix the ITs after rebase Signed-off-by: Jialiang Liang <[email protected]> * fix clickbench IT and more ITs Signed-off-by: Jialiang Liang <[email protected]> * address comments from peng Signed-off-by: Jialiang Liang <[email protected]> * add yaml test Signed-off-by: Jialiang Liang <[email protected]> --------- Signed-off-by: Jialiang Liang <[email protected]> Co-authored-by: Peng Huo <[email protected]> (cherry picked from commit 52fe8aa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…PL (#4841) (#4895) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Peng Huo <[email protected]>

…PL (opensearch-project#4841) * [BugFix] Fix Regex OOM when there are 10+ regex clauses Signed-off-by: Jialiang Liang <[email protected]> * fix unit tests Signed-off-by: Jialiang Liang <[email protected]> * fix tests Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests and corresponding commands Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests for testFilterPushDownExplain Signed-off-by: Jialiang Liang <[email protected]> * peng - isolate the fix logic to its own visitor class Signed-off-by: Jialiang Liang <[email protected]> * Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]> * fix the UTs Signed-off-by: Jialiang Liang <[email protected]> * fix the ITs after rebase Signed-off-by: Jialiang Liang <[email protected]> * fix clickbench IT and more ITs Signed-off-by: Jialiang Liang <[email protected]> * address comments from peng Signed-off-by: Jialiang Liang <[email protected]> * add yaml test Signed-off-by: Jialiang Liang <[email protected]> --------- Signed-off-by: Jialiang Liang <[email protected]> Co-authored-by: Peng Huo <[email protected]>

RyanL1997 force-pushed the regex-oom-fix branch from 4d8edf8 to bdf8023 Compare November 21, 2025 05:05

RyanL1997 added PPL Piped processing language bugFix labels Nov 21, 2025

RyanL1997 changed the title ~~[WIP - DO NOT REVIEW][BugFix] Fix Regex OOM when there are 10+ regex clauses~~ [WIP - DO NOT REVIEW][BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL Nov 21, 2025

Swiddis reviewed Nov 21, 2025

View reviewed changes

RyanL1997 changed the title ~~[WIP - DO NOT REVIEW][BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL~~ [BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL Nov 24, 2025

RyanL1997 marked this pull request as ready for review November 24, 2025 12:02

RyanL1997 added 2 commits December 1, 2025 11:11

[BugFix] Fix Regex OOM when there are 10+ regex clauses

f5a33d1

Signed-off-by: Jialiang Liang <[email protected]>

fix unit tests

8521325

Signed-off-by: Jialiang Liang <[email protected]>

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

RyanL1997 and others added 2 commits December 1, 2025 12:02

peng - isolate the fix logic to its own visitor class

3dfd44b

Signed-off-by: Jialiang Liang <[email protected]>

Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan

ad43837

Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]>

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

fix the UTs

ae78fda

Signed-off-by: Jialiang Liang <[email protected]>

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

fix the ITs after rebase

5e9ce99

Signed-off-by: Jialiang Liang <[email protected]>

penghuo reviewed Dec 1, 2025

View reviewed changes

RyanL1997 added 2 commits December 1, 2025 17:16

fix clickbench IT and more ITs

3dc994b

Signed-off-by: Jialiang Liang <[email protected]>

address comments from peng

8568c2f

Signed-off-by: Jialiang Liang <[email protected]>

penghuo previously approved these changes Dec 2, 2025

View reviewed changes

add yaml test

4804ada

Signed-off-by: Jialiang Liang <[email protected]>

RyanL1997 dismissed penghuo’s stale review via 4804ada December 2, 2025 20:16

penghuo approved these changes Dec 2, 2025

View reviewed changes

penghuo enabled auto-merge (squash) December 2, 2025 21:08

Swiddis approved these changes Dec 2, 2025

View reviewed changes

penghuo merged commit 52fe8aa into opensearch-project:main Dec 2, 2025
38 checks passed

RyanL1997 added the backport 2.19-dev label Dec 2, 2025

opensearch-trigger-bot bot mentioned this pull request Dec 2, 2025

[Backport 2.19-dev] [BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL #4895

Merged

coderabbitai bot mentioned this pull request Dec 9, 2025

Remove all AccessController refs #4924

Merged

8 tasks

This was referenced Dec 17, 2025

phrase_prefix query includes IDF scores for expanded terms that don't exist in the document opensearch-project/OpenSearch#20272

Open

[BUG] [Draft] Correct bin command implementation to pass validation #4973

Open

[RFC] Support PPL format command #4975

Closed

TackAdam mentioned this pull request Dec 18, 2025

Discover:Traces test data + memory fix opensearch-project/OpenSearch-Dashboards#11072

Merged

7 tasks

This was referenced Dec 22, 2025

Use Calcite's validation system for type checking & coercion #4892

Open

[RFC] PPL convert Command #5001

Open

[BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL #4841

[BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL #4841

Uh oh!

Conversation

RyanL1997 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Implementation Summary

How It Works: Before vs After

Before (Memory Explosion)

After (Efficient Single Filter)

Results

Related Issues

Check List

Summary by CodeRabbit

Uh oh!

Swiddis Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

RyanL1997 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

RyanL1997 commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

penghuo Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

RyanL1997 Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

penghuo left a comment

Choose a reason for hiding this comment

Uh oh!

penghuo Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RyanL1997 commented Nov 21, 2025 •

edited

Loading

RyanL1997 commented Dec 1, 2025 •

edited

Loading

RyanL1997 Dec 2, 2025 •

edited

Loading