-
Notifications
You must be signed in to change notification settings - Fork 180
[BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL #4841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4d8edf8 to
bdf8023
Compare
| int count = 0; | ||
|
|
||
| // Count this node if it's a filtering operation | ||
| // BUT: Don't count Filter nodes that contain function calls, as they can cause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we still have the same problem with where clauses that have function calls, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire design has changed - right now we apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan
Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
3318-3357: Consider adding Project, Parse, and Patterns to the stop list for schema-changing operations.The stop list includes
Aggregation, Eval, Window, StreamWindow, but other schema-changing operations likeProject(with computed expressions),Parse(adds extracted fields),Patterns(adds pattern fields),Flatten, andExpandare not included.While these operations have
flushFiltersBeforeSchemaChangecalls in their visitor methods, the counting logic may still count filters across these boundaries. This could lead to enabling accumulation across schema changes, though the flush calls should still segment them correctly at runtime.For consistency, consider aligning the stop list with all schema-changing operations:
if (plan instanceof Aggregation || plan instanceof Eval + || plan instanceof Parse + || plan instanceof Patterns + || plan instanceof Flatten + || plan instanceof Expand || plan instanceof Window || plan instanceof StreamWindow) { return count; }
766-767: Missing flush before schema-changing patterns operation.
visitPatternsadds new fields (pattern,tokens,sample_logs) which constitutes a schema change. While not in thecountFilteringOperationsstop list, any pending filter conditions should be flushed before this schema modification to prevent RexInputRef mismatches.@Override public RelNode visitPatterns(Patterns node, CalcitePlanContext context) { visitChildren(node, context); + flushFiltersBeforeSchemaChange(context); RexNode showNumberedTokenExpr = rexVisitor.analyze(node.getShowNumberedToken(), context);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java(3 hunks)core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(10 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_date_string.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_time_string.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_timestamp_string.yaml(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRegexTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (17)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (2)
65-69: LGTM! Well-structured filter accumulation state management.The separation of the pending conditions list from the enabled flag is clean. The
ArrayListis appropriate here since operations are append-only and order-preserving for the AND combination.
157-170: LGTM! Correct handling of single vs. multiple conditions.The optimization to avoid unnecessary
ANDwrapping for single conditions (line 163) is appropriate, and the list is correctly cleared after flushing (line 169).core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (6)
178-193: LGTM! Clean accumulation lifecycle management.The try-finally pattern ensures the flag is always disabled after processing, and the flush before returning handles accumulated conditions properly.
271-276: LGTM! Correct filter accumulation for non-subquery cases.Subquery expressions are correctly excluded from accumulation (they require correlation variable handling), while standard filters are accumulated when enabled.
325-330: LGTM! Regex conditions correctly route through accumulation.This matches the visitFilter pattern, ensuring both regex and where clause filters benefit from accumulation.
2293-2315: LGTM! Proper per-subsearch isolation of accumulation state.Saving and restoring
filterAccumulationEnabledensures each subsearch independently determines its accumulation needs without cross-branch state leakage. Usinganalyze()for each subsearch allows independent filter counting.
3328-3335: Confirm this addresses the function-call filter accumulation concern.The check at line 3332 excludes Filter nodes containing function calls from the count, meaning they won't contribute to enabling accumulation mode. When accumulation IS enabled,
visitFilterat line 272-276 will still add such conditions to accumulation if the flag is already enabled.To fully prevent type mismatches, you may also want to check
containsFunctionCallinvisitFilterbefore accumulating:} else { // Use filter accumulation to prevent deep Filter node chains - if (context.isFilterAccumulationEnabled()) { + if (context.isFilterAccumulationEnabled() && !containsFunctionCall(node.getCondition())) { context.addFilterCondition(condition); } else { context.relBuilder.filter(condition); } }This ensures filters with function calls are never accumulated, regardless of how accumulation was enabled.
1173-1186: Verify flush timing relative to visitChildren in aggregation.The flush at line 1175 occurs before
visitChildrenis called (line 1206 in the private method). In other visitor methods (visitProject, visitEval, etc.), the pattern isvisitChildren→flushFiltersBeforeSchemaChange. However, sincecountFilteringOperationsstops at Aggregation nodes (lines 3340-3344) and doesn't recurse into children, accumulation mode typically won't be enabled for queries rooted at an Aggregation.If accumulation could be enabled for queries where Aggregation is not the root (e.g.,
source=t | stats count() | regex /p/), verify filters accumulated before the Aggregation are flushed at the correct point.integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1)
5-6: LGTM! Test expectation correctly updated for filter accumulation.The change from nested filters to a single
LogicalFilterwith an AND conjunction correctly reflects the new filter accumulation behavior.integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (1)
5-6: LGTM! Filter predicates correctly combined.The SEARCH and greater-than predicates are now combined into a single AND filter, demonstrating the accumulation mechanism working as intended.
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_date_string.yaml (1)
4-6: LGTM! No-pushdown scenario correctly updated.The date range filters are combined into a single LogicalFilter, and the physical plan appropriately uses EnumerableCalc with a SARG for the combined condition.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRegexTest.java (1)
41-53: Updated chained-regex expectations correctly reflect filter accumulationThe revised expected logical plan (single
LogicalFilterwithAND(REGEXP_CONTAINS(...), REGEXP_CONTAINS(...))) and Spark SQL (WHERE ... AND ...) align with the new accumulation behavior while preserving semantics of the original two-step filtering. Looks good as a regression test for the consolidation logic.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_time_string.yaml (1)
4-6: Logical plan consolidation to single time-range filter looks correctCombining the two time-bound predicates into one
LogicalFilter(condition=[AND(>($0, TIME(...)), <($0, TIME(...)))])with theLogicalProjectandCalciteLogicalIndexScandirectly beneath it matches the new accumulation strategy and keeps the original half-open time range semantics intact.integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml (1)
4-6: Date-range filter merge aligns with accumulated-filter semanticsThe new single
LogicalFilterusingAND(>($0, DATE(...)), <($0, DATE(...)))aboveLogicalProjectandCalciteLogicalIndexScanmatches the intended consolidation while the physical Sarg range still represents the same exclusive date interval, so the updated expectations look correct.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push.yaml (1)
5-6: Consolidated age/search and balance predicates into one filter correctlyMerging the SEARCH-based age range and
>($3, 10000)into a singleLogicalFilter(condition=[AND(SEARCH($8, Sarg[(30..40)]), >($3, 10000))])matches the accumulation design and maintains the original filtering semantics for the no-pushdown plan.integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1)
4-6: Single time-range LogicalFilter plus pushdown Sarg match intended behaviorThe unified
LogicalFiltercombining the two TIME comparisons with AND, together with theCalciteEnumerableIndexScanPushDownContext that uses a single Sarg time range, accurately reflects the accumulated-filter and pushdown behavior described in the PR.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_filter_push_compare_timestamp_string.yaml (1)
5-6: Timestamp-range filters correctly merged into a single LogicalFilterThe updated logical plan’s
LogicalFilter(condition=[AND(>($3, TIMESTAMP(...)), <($3, TIMESTAMP(...)))])aboveCalciteLogicalIndexScan, together with the Sarg-based timestamp range in the physical plan, is consistent with the new accumulation mechanism and keeps the original exclusive bounds behavior.
Signed-off-by: Jialiang Liang <[email protected]>
Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)
268-278: Consider caching the HepProgram as a static field.The
HepProgramis stateless and immutable, so it can be safely shared across invocations. This avoids repeated object allocation on every query.+ private static final HepProgram FILTER_MERGE_PROGRAM = + new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build(); + /** * Run Calcite FILTER_MERGE once so adjacent filters created during analysis can collapse before * the rest of optimization. */ private RelNode mergeAdjacentFilters(RelNode relNode) { - HepProgram program = - new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build(); - HepPlanner planner = new HepPlanner(program); + HepPlanner planner = new HepPlanner(FILTER_MERGE_PROGRAM); planner.setRoot(relNode); return planner.findBestExp(); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(2 hunks)core/src/main/java/org/opensearch/sql/executor/QueryService.java(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (3)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (3)
19-27: LGTM!The imports for Hep-based planning and FilterMergeRule are appropriate for the targeted filter merging approach.
106-108: LGTM!Correctly positioned after analysis and before optimization to ensure adjacent filters are collapsed early, preventing the combinatorial explosion in the main optimization phase.
152-154: LGTM!Consistent with the execution path, ensuring explain output accurately reflects the merged filter optimization.
Signed-off-by: Jialiang Liang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (1)
113-120:mergeAdjacentFilterscorrectly encapsulates a dedicated HepPlanner; optional reuse of HepProgramThe helper’s use of a HepProgram with just
FilterMergeRule.Config.DEFAULT.toRule()andplanner.findBestExp()is a standard, safe way to merge adjacent filters, and is suitable for tests. IfgetRelNodeis invoked very frequently in larger suites, you could optionally cache theHepProgram(or even theHepPlanner) as a static field to avoid repeatedly rebuilding it, but this is a minor optimization for test-only code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java(2 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTrendlineTest.java(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (25, unit)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTrendlineTest.java (1)
79-80: Combined filter expectations correctly reflect new accumulation behaviorCollapsing the two IS NOT NULL checks into a single
LogicalFilterwithAND(IS NOT NULL($5), IS NOT NULL($7))and mirroring that via a singleWHERE SAL IS NOT NULL AND DEPTNO IS NOT NULLin the Spark SQL keeps the test semantically equivalent to the prior nested filters while aligning with the new adjacent-filter consolidation in the planner.Also applies to: 92-92
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java (2)
25-31: HepPlanner / FilterMergeRule imports are appropriate and correctly scopedThe added Calcite hep and FilterMergeRule imports are all used by
mergeAdjacentFiltersand confined to test code, which keeps the production surface unchanged.
102-111: getRelNode now returns a filter-merged plan; confirm alignment with production pipelineRunning
mergeAdjacentFilters(root)here ensures all tests see a plan with merged adjacent filters, matching the intended planner behavior; this looks correct. Please double-check that the productionQueryService(or equivalent entrypoint) applies the same filter-merge stage so tests stay in sync if that pipeline changes.
|
Transferring some of the communication with @penghuo here: First, I agree that the previous implementation infected too much of the original calcite relnode visitor class logic, by applying ...
flushFiltersBeforeSchemaChange(context);
...to plethora of Second, I tried the both @penghuo's suggestions
with 3dfd44b and ad43837 and both of them works. According to the above, I selected the approach of applying Calcite's FilterMergeRule directly in QueryService.java (using HepPlanner) for the following reasons:
|
Signed-off-by: Jialiang Liang <[email protected]>
core/src/main/java/org/opensearch/sql/executor/QueryService.java
Outdated
Show resolved
Hide resolved
| new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build(); | ||
| HepPlanner planner = new HepPlanner(program); | ||
| planner.setRoot(relNode); | ||
| return planner.findBestExp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure performance impact, did u verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just scripted a mini benchmark break down by directly leverage the clickbench IT queries. The following report shows the detailed performance of each planning phase - in summary, performance testing shows filter merge adds only 0.19ms average overhead (10% of planning time, <1% of total query time).
> python3 analyze_performance.py
Analyzing log file: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/integ-test/build/testclusters/integTest-0/logs/integTest.log
Using test log for query names: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_results.log
================================================================================
FILTER MERGE PERFORMANCE ANALYSIS
================================================================================
📊 OVERALL STATISTICS (168 queries)
--------------------------------------------------------------------------------
Filter Merge Time:
Mean: 186 μs ( 0.19 ms)
Median: 103 μs ( 0.10 ms)
Std Dev: 197 μs
Min: 41 μs ( 0.04 ms)
Max: 1541 μs ( 1.54 ms)
Total Planning Time:
Mean: 1870 μs ( 1.87 ms)
Median: 1750 μs ( 1.75 ms)
Filter Merge as % of Planning:
Mean: 9.87%
Median: 6.22%
Max: 47.52%
================================================================================
📈 PERFORMANCE ASSESSMENT
--------------------------------------------------------------------------------
Average overhead: 0.19ms (9.9% of planning)
Recommendation: No optimization needed. Merge immediately.
================================================================================
📊 PERCENTILE ANALYSIS
--------------------------------------------------------------------------------
Filter Merge Time Percentiles:
p50: 105 μs ( 0.10 ms)
p95: 477 μs ( 0.48 ms)
p99: 1541 μs ( 1.54 ms)
================================================================================
⏱️ PLANNING PHASE BREAKDOWN
--------------------------------------------------------------------------------
Phase Averages:
Analyze: 1672 μs ( 89.4%)
Filter Merge: 186 μs ( 10.0%) ← THIS IS WHAT WE ADDED
Optimize: 9 μs ( 0.5%)
Convert: 0 μs ( 0.0%)
TOTAL: 1870 μs (100.0%)
================================================================================
🐢 TOP 10 SLOWEST FILTER MERGE TIMES
--------------------------------------------------------------------------------
Rank Query Avg Merge Time Max Merge Time % of Planning
--------------------------------------------------------------------------------
1 Query46 1541 μs ( 1.54ms) 1541 μs ( 1.54ms) 47.5%
2 Query29 543 μs ( 0.54ms) 543 μs ( 0.54ms) 25.5%
3 Query24 529 μs ( 0.53ms) 529 μs ( 0.53ms) 24.5%
4 Query54 513 μs ( 0.51ms) 513 μs ( 0.51ms) 18.8%
5 Query44 477 μs ( 0.48ms) 477 μs ( 0.48ms) 16.1%
6 Query23 445 μs ( 0.45ms) 445 μs ( 0.45ms) 22.9%
7 Query15 390 μs ( 0.39ms) 390 μs ( 0.39ms) 19.9%
8 Query71 388 μs ( 0.39ms) 388 μs ( 0.39ms) 20.4%
9 Query16 377 μs ( 0.38ms) 377 μs ( 0.38ms) 17.8%
10 Query55 351 μs ( 0.35ms) 351 μs ( 0.35ms) 18.9%
================================================================================
📈 DISTRIBUTION ANALYSIS
--------------------------------------------------------------------------------
Filter Merge Time Distribution:
<100μs 82 ( 48.8%) ████████████████████████
100-500μs 78 ( 46.4%) ███████████████████████
500-1000μs (1ms) 6 ( 3.6%) █
1-5ms 2 ( 1.2%)
5-10ms 0 ( 0.0%)
>10ms 0 ( 0.0%)
================================================================================
📄 Detailed CSV exported to: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_analysis.csv
================================================================================
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
penghuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. 2 concerns regarding to performance impact. We should monitor the Big5 benchmark closely as next step.
- Introduce HepPlanner rules.
- After filter_merge, exists filters were added. Because missing_bucket=false, these filters are redundant.
| CalciteLogicalIndexScan(table=[[OpenSearch, big5]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, big5]], PushDownContext=[[PROJECT->[process.name, cloud.region, @timestamp, aws.cloudwatch.log_stream], FILTER->SEARCH($2, Sarg[['2023-01-02 00:00:00':VARCHAR..'2023-01-02 10:00:00':VARCHAR)]:VARCHAR), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1, 2},count()=COUNT()), PROJECT->[count(), process.name, cloud.region, aws.cloudwatch.log_stream], SORT->[1 DESC LAST, 2 ASC FIRST, 3 ASC FIRST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"range":{"@timestamp":{"from":"2023-01-02T00:00:00.000Z","to":"2023-01-02T10:00:00.000Z","include_lower":true,"include_upper":false,"format":"date_time","boost":1.0}}},"_source":{"includes":["process.name","cloud.region","@timestamp","aws.cloudwatch.log_stream"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10,"sources":[{"process.name":{"terms":{"field":"process.name","missing_bucket":false,"order":"desc"}}},{"cloud.region":{"terms":{"field":"cloud.region","missing_bucket":false,"order":"asc"}}},{"aws.cloudwatch.log_stream":{"terms":{"field":"aws.cloudwatch.log_stream","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, big5]], PushDownContext=[[PROJECT->[process.name, cloud.region, @timestamp, aws.cloudwatch.log_stream], FILTER->AND(SEARCH($2, Sarg[['2023-01-02 00:00:00':VARCHAR..'2023-01-02 10:00:00':VARCHAR)]:VARCHAR), IS NOT NULL($0), IS NOT NULL($1), IS NOT NULL($3)), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1, 2},count()=COUNT()), PROJECT->[count(), process.name, cloud.region, aws.cloudwatch.log_stream], SORT->[1 DESC LAST, 2 ASC FIRST, 3 ASC FIRST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"range":{"@timestamp":{"from":"2023-01-02T00:00:00.000Z","to":"2023-01-02T10:00:00.000Z","include_lower":true,"include_upper":false,"format":"date_time","boost":1.0}}},{"exists":{"field":"process.name","boost":1.0}},{"exists":{"field":"cloud.region","boost":1.0}},{"exists":{"field":"aws.cloudwatch.log_stream","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["process.name","cloud.region","@timestamp","aws.cloudwatch.log_stream"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10,"sources":[{"process.name":{"terms":{"field":"process.name","missing_bucket":false,"order":"desc"}}},{"cloud.region":{"terms":{"field":"cloud.region","missing_bucket":false,"order":"asc"}}},{"aws.cloudwatch.log_stream":{"terms":{"field":"aws.cloudwatch.log_stream","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After filter_merge, three exists filters were added. Because missing_bucket=false, these filters are redundant.
{"exists":{"field":"process.name","boost":1.0}},{"exists":{"field":"cloud.region","boost":1.0}},{"exists":{"field":"aws.cloudwatch.log_stream","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}
I did not expect any performance regression from this change, but we should monitor the Big5 benchmark closely.
Signed-off-by: Jialiang Liang <[email protected]>
…PL (#4841) * [BugFix] Fix Regex OOM when there are 10+ regex clauses Signed-off-by: Jialiang Liang <[email protected]> * fix unit tests Signed-off-by: Jialiang Liang <[email protected]> * fix tests Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests and corresponding commands Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests for testFilterPushDownExplain Signed-off-by: Jialiang Liang <[email protected]> * peng - isolate the fix logic to its own visitor class Signed-off-by: Jialiang Liang <[email protected]> * Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]> * fix the UTs Signed-off-by: Jialiang Liang <[email protected]> * fix the ITs after rebase Signed-off-by: Jialiang Liang <[email protected]> * fix clickbench IT and more ITs Signed-off-by: Jialiang Liang <[email protected]> * address comments from peng Signed-off-by: Jialiang Liang <[email protected]> * add yaml test Signed-off-by: Jialiang Liang <[email protected]> --------- Signed-off-by: Jialiang Liang <[email protected]> Co-authored-by: Peng Huo <[email protected]> (cherry picked from commit 52fe8aa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…PL (#4841) (#4895) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Peng Huo <[email protected]>
…PL (opensearch-project#4841) * [BugFix] Fix Regex OOM when there are 10+ regex clauses Signed-off-by: Jialiang Liang <[email protected]> * fix unit tests Signed-off-by: Jialiang Liang <[email protected]> * fix tests Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests and corresponding commands Signed-off-by: Jialiang Liang <[email protected]> * fix explain tests for testFilterPushDownExplain Signed-off-by: Jialiang Liang <[email protected]> * peng - isolate the fix logic to its own visitor class Signed-off-by: Jialiang Liang <[email protected]> * Directly apply Calcite CoreRules.FILTER_MERGE before VolcanoPlanner plan Co-authored-by: Peng Huo <[email protected]> Signed-off-by: Jialiang Liang <[email protected]> * fix the UTs Signed-off-by: Jialiang Liang <[email protected]> * fix the ITs after rebase Signed-off-by: Jialiang Liang <[email protected]> * fix clickbench IT and more ITs Signed-off-by: Jialiang Liang <[email protected]> * address comments from peng Signed-off-by: Jialiang Liang <[email protected]> * add yaml test Signed-off-by: Jialiang Liang <[email protected]> --------- Signed-off-by: Jialiang Liang <[email protected]> Co-authored-by: Peng Huo <[email protected]>
Description
Implemented a filter accumulation mechanism that combines multiple filter conditions into a single Filter RelNode before Calcite's optimization phase begins, preventing the deep Filter chains that trigger combinatorial explosion.
Implementation Summary
The fix introduces automatic detection and accumulation of filtering operations:
Automatic Detection: When a query is analyzed, the system counts all filtering operations ( e.g.
regexandwhere) in the AST. If 2 or more filtering operations are detected, filter accumulation mode is automatically enabled.Filter Accumulation: Instead of creating individual Filter RelNodes for each
regexorwhereoperation, all filter conditions are collected in a list during the analysis phase.Single Combined Filter: All accumulated conditions are combined with AND operations into a single Filter RelNode, preventing the deep chains that caused memory exhaustion.
Schema-Aware Flushing: Accumulated filters are flushed before any schema-changing operations (like
fields) to ensure field references remain valid.No change needed for command implementations: The fix is completely automatic - no query rewriting or user action required. Queries produce identical results to the original implementation.
How It Works: Before vs After
Before (Memory Explosion)
After (Efficient Single Filter)
Results
Related Issues
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.