-
Notifications
You must be signed in to change notification settings - Fork 181
[BugFix] Fix Memory Exhaustion for Multiple Filtering Operations in PPL #4841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
f5a33d1
8521325
c85ed67
19a82a0
35c56c5
3dfd44b
ad43837
ae78fda
5e9ce99
3dc994b
8568c2f
4804ada
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,11 +16,15 @@ | |
| import lombok.extern.log4j.Log4j2; | ||
| import org.apache.calcite.jdbc.CalciteSchema; | ||
| import org.apache.calcite.plan.RelTraitDef; | ||
| import org.apache.calcite.plan.hep.HepPlanner; | ||
| import org.apache.calcite.plan.hep.HepProgram; | ||
| import org.apache.calcite.plan.hep.HepProgramBuilder; | ||
| import org.apache.calcite.rel.RelCollation; | ||
| import org.apache.calcite.rel.RelCollations; | ||
| import org.apache.calcite.rel.RelNode; | ||
| import org.apache.calcite.rel.core.Sort; | ||
| import org.apache.calcite.rel.logical.LogicalSort; | ||
| import org.apache.calcite.rel.rules.FilterMergeRule; | ||
| import org.apache.calcite.schema.SchemaPlus; | ||
| import org.apache.calcite.sql.parser.SqlParser; | ||
| import org.apache.calcite.tools.FrameworkConfig; | ||
|
|
@@ -100,6 +104,7 @@ public void executeWithCalcite( | |
| CalcitePlanContext.create( | ||
| buildFrameworkConfig(), SysLimit.fromSettings(settings), queryType); | ||
| RelNode relNode = analyze(plan, context); | ||
| relNode = mergeAdjacentFilters(relNode); | ||
| RelNode optimized = optimize(relNode, context); | ||
| RelNode calcitePlan = convertToCalcitePlan(optimized); | ||
| executionEngine.execute(calcitePlan, context, listener); | ||
|
|
@@ -145,6 +150,7 @@ public void explainWithCalcite( | |
| context.run( | ||
| () -> { | ||
| RelNode relNode = analyze(plan, context); | ||
| relNode = mergeAdjacentFilters(relNode); | ||
| RelNode optimized = optimize(relNode, context); | ||
| RelNode calcitePlan = convertToCalcitePlan(optimized); | ||
| executionEngine.explain(calcitePlan, format, context, listener); | ||
|
|
@@ -259,6 +265,18 @@ public RelNode analyze(UnresolvedPlan plan, CalcitePlanContext context) { | |
| return getRelNodeVisitor().analyze(plan, context); | ||
| } | ||
|
|
||
| /** | ||
| * Run Calcite FILTER_MERGE once so adjacent filters created during analysis can collapse before | ||
| * the rest of optimization. | ||
| */ | ||
| private RelNode mergeAdjacentFilters(RelNode relNode) { | ||
| HepProgram program = | ||
| new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build(); | ||
| HepPlanner planner = new HepPlanner(program); | ||
RyanL1997 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| planner.setRoot(relNode); | ||
| return planner.findBestExp(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure performance impact, did u verify?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just scripted a mini benchmark break down by directly leverage the clickbench IT queries. The following report shows the detailed performance of each planning phase - in summary, performance testing shows filter merge adds only 0.19ms average overhead (10% of planning time, <1% of total query time). > python3 analyze_performance.py
Analyzing log file: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/integ-test/build/testclusters/integTest-0/logs/integTest.log
Using test log for query names: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_results.log
================================================================================
FILTER MERGE PERFORMANCE ANALYSIS
================================================================================
📊 OVERALL STATISTICS (168 queries)
--------------------------------------------------------------------------------
Filter Merge Time:
Mean: 186 μs ( 0.19 ms)
Median: 103 μs ( 0.10 ms)
Std Dev: 197 μs
Min: 41 μs ( 0.04 ms)
Max: 1541 μs ( 1.54 ms)
Total Planning Time:
Mean: 1870 μs ( 1.87 ms)
Median: 1750 μs ( 1.75 ms)
Filter Merge as % of Planning:
Mean: 9.87%
Median: 6.22%
Max: 47.52%
================================================================================
📈 PERFORMANCE ASSESSMENT
--------------------------------------------------------------------------------
Average overhead: 0.19ms (9.9% of planning)
Recommendation: No optimization needed. Merge immediately.
================================================================================
📊 PERCENTILE ANALYSIS
--------------------------------------------------------------------------------
Filter Merge Time Percentiles:
p50: 105 μs ( 0.10 ms)
p95: 477 μs ( 0.48 ms)
p99: 1541 μs ( 1.54 ms)
================================================================================
⏱️ PLANNING PHASE BREAKDOWN
--------------------------------------------------------------------------------
Phase Averages:
Analyze: 1672 μs ( 89.4%)
Filter Merge: 186 μs ( 10.0%) ← THIS IS WHAT WE ADDED
Optimize: 9 μs ( 0.5%)
Convert: 0 μs ( 0.0%)
TOTAL: 1870 μs (100.0%)
================================================================================
🐢 TOP 10 SLOWEST FILTER MERGE TIMES
--------------------------------------------------------------------------------
Rank Query Avg Merge Time Max Merge Time % of Planning
--------------------------------------------------------------------------------
1 Query46 1541 μs ( 1.54ms) 1541 μs ( 1.54ms) 47.5%
2 Query29 543 μs ( 0.54ms) 543 μs ( 0.54ms) 25.5%
3 Query24 529 μs ( 0.53ms) 529 μs ( 0.53ms) 24.5%
4 Query54 513 μs ( 0.51ms) 513 μs ( 0.51ms) 18.8%
5 Query44 477 μs ( 0.48ms) 477 μs ( 0.48ms) 16.1%
6 Query23 445 μs ( 0.45ms) 445 μs ( 0.45ms) 22.9%
7 Query15 390 μs ( 0.39ms) 390 μs ( 0.39ms) 19.9%
8 Query71 388 μs ( 0.39ms) 388 μs ( 0.39ms) 20.4%
9 Query16 377 μs ( 0.38ms) 377 μs ( 0.38ms) 17.8%
10 Query55 351 μs ( 0.35ms) 351 μs ( 0.35ms) 18.9%
================================================================================
📈 DISTRIBUTION ANALYSIS
--------------------------------------------------------------------------------
Filter Merge Time Distribution:
<100μs 82 ( 48.8%) ████████████████████████
100-500μs 78 ( 46.4%) ███████████████████████
500-1000μs (1ms) 6 ( 3.6%) █
1-5ms 2 ( 1.2%)
5-10ms 0 ( 0.0%)
>10ms 0 ( 0.0%)
================================================================================
📄 Detailed CSV exported to: /Users/jiallian/Desktop/opensearch/sql-team/cve-fix/sql/performance_analysis.csv
================================================================================ |
||
| } | ||
|
|
||
| /** Analyze {@link UnresolvedPlan}. */ | ||
| public LogicalPlan analyze(UnresolvedPlan plan, QueryType queryType) { | ||
| return analyzer.analyze(plan, new AnalysisContext(queryType)); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,8 @@ | ||
| calcite: | ||
| logical: | | ||
| LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) | ||
| LogicalFilter(condition=[<($0, DATE('2018-11-09 00:00:00.000000000':VARCHAR))]) | ||
| LogicalFilter(condition=[>($0, DATE('2016-12-08 00:00:00.123456789':VARCHAR))]) | ||
| LogicalProject(yyyy-MM-dd=[$83]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_date_formats]]) | ||
| LogicalFilter(condition=[AND(>($0, DATE('2016-12-08 00:00:00.123456789':VARCHAR)), <($0, DATE('2018-11-09 00:00:00.000000000':VARCHAR)))]) | ||
| LogicalProject(yyyy-MM-dd=[$83]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_date_formats]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_date_formats]], PushDownContext=[[PROJECT->[yyyy-MM-dd], FILTER->SEARCH($0, Sarg[('2016-12-08':VARCHAR..'2018-11-09':VARCHAR)]:VARCHAR), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","query":{"range":{"yyyy-MM-dd":{"from":"2016-12-08","to":"2018-11-09","include_lower":false,"include_upper":false,"boost":1.0}}},"_source":{"includes":["yyyy-MM-dd"],"excludes":[]}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,8 @@ | ||
| calcite: | ||
| logical: | | ||
| LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) | ||
| LogicalFilter(condition=[<($0, TIME('2018-11-09 19:00:00.123456789':VARCHAR))]) | ||
| LogicalFilter(condition=[>($0, TIME('2016-12-08 12:00:00.123456789':VARCHAR))]) | ||
| LogicalProject(custom_time=[$49]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_date_formats]]) | ||
| LogicalFilter(condition=[AND(>($0, TIME('2016-12-08 12:00:00.123456789':VARCHAR)), <($0, TIME('2018-11-09 19:00:00.123456789':VARCHAR)))]) | ||
| LogicalProject(custom_time=[$49]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_date_formats]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_date_formats]], PushDownContext=[[PROJECT->[custom_time], FILTER->SEARCH($0, Sarg[('12:00:00.123456789':VARCHAR..'19:00:00.123456789':VARCHAR)]:VARCHAR), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","query":{"range":{"custom_time":{"from":"12:00:00.123456789","to":"19:00:00.123456789","include_lower":false,"include_upper":false,"boost":1.0}}},"_source":{"includes":["custom_time"],"excludes":[]}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) |
Uh oh!
There was an error while loading. Please reload this page.