ESQL: Physical Planning on the Lookup Node#143707
ESQL: Physical Planning on the Lookup Node#143707julian-elastic merged 14 commits intoelastic:mainfrom
Conversation
2270140 to
70686e0
Compare
70686e0 to
930bc1c
Compare
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ba0a033 to
ac919d6
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Show resolved
Hide resolved
…elocations * upstream/main: (72 commits) [Test] Randomly disable sequence numbers in CcrTimeSeriesDataStreamsIT (elastic#143930) Fix AsyncSearchIndexServiceTests.testCircuitBreaker failure (elastic#144058) Refine GenerativeIT some more, this time with accounting for some added (elastic#144220) ESQL: Physical Planning on the Lookup Node (elastic#143707) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats by with zero variance} elastic#144240 Trigger counter metrics in test for delta temporality measurements (elastic#144193) fix capabiltiy approximation_v3 (elastic#144230) [ci] Add PR pipeline for testing ipv6 and fix tests not working with ipv6 (elastic#140473) update (elastic#144095) Make from/to optional in TBUCKET when Kibana timestamp filter is present (elastic#144057) Extract reroute behavior from create-index request classes (elastic#144140) ESQL: Fix release build only failures (elastic#144122) ES|QL query approximation: move sample correction to data node (elastic#144005) Add indexing pressure tracking to OTLP endpoints (elastic#144009) Fix replica writes after _seq_no doc values are pruned (elastic#144180) allow tests to configure supportsLoadingConfig (elastic#144061) [ES|QL] Unmute testGiantTextFieldInSubqueryIntermediateResultsWithSort (elastic#144126) [ESQL][DOCS] Add CPS page (unpublished for moment) (elastic#144206) ESQL: Forbid "load" unmapped_fields for certain commands (elastic#144115) Add CCS Remote Views Detection (elastic#143384) ...
Moves the physical plan construction and optimization for LOOKUP JOIN to the lookup node itself, enabling proper optimizer rules (like filter pushdown to Lucene) to run against the lookup-side plan. Previously, the lookup node received raw request fields (match fields, extract fields, join conditions, pre-join filter plan) and manually constructed a physical plan tree (ParameterizedQueryExec -> FieldExtractExec -> ProjectExec -> OutputExec). This bypassed the standard optimizer pipeline, meaning right-side-only filters could only be pushed to Lucene via a special code path in ExpressionQueryList.buildPreJoinFilter. Now, the data node builds a logical plan (rooted at ParameterizedQuery, with an optional Filter node and a Project on top) and sends it to the lookup node inside a FragmentExec. The lookup node then runs LocalMapper -> LookupPhysicalPlanOptimizer to produce the physical plan, reusing existing optimizer rules like PushFiltersToSource and InsertFieldExtraction. Assisted by Cursor
| * Physical plan optimizer for the lookup node. Mirrors {@link LocalPhysicalPlanOptimizer} but with a | ||
| * reduced rule set appropriate for lookup plans (rooted at ParameterizedQueryExec, not EsSourceExec). | ||
| */ | ||
| public class LookupPhysicalPlanOptimizer extends ParameterizedRuleExecutor<PhysicalPlan, LocalPhysicalOptimizerContext> { |
There was a problem hiding this comment.
I'm not sure if re-using the LocalPhysicalOptimizerContext is appropriate. It has stuff in it that's not really relevant for non-data node plans.
There was a problem hiding this comment.
PushFiltersToSource, InsertFieldExtraction rules are parameterized on LocalPhysicalOptimizerContext. I think it should be ok to create LocalPhysicalOptimizerContext here, so we can avoid duplicating the rules or having a complex rule hierarchy.
public class PushFiltersToSource extends PhysicalOptimizerRules.ParameterizedOptimizerRule<FilterExec, LocalPhysicalOptimizerContext> {
...n/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LookupPhysicalPlanOptimizer.java
Show resolved
Hide resolved
Moves the physical plan construction and optimization for LOOKUP JOIN to the lookup node itself, enabling proper optimizer rules (like filter pushdown to Lucene) to run against the lookup-side plan. Previously, the lookup node received raw request fields (match fields, extract fields, join conditions, pre-join filter plan) and manually constructed a physical plan tree (ParameterizedQueryExec -> FieldExtractExec -> ProjectExec -> OutputExec). This bypassed the standard optimizer pipeline, meaning right-side-only filters could only be pushed to Lucene via a special code path in ExpressionQueryList.buildPreJoinFilter. Now, the data node builds a logical plan (rooted at ParameterizedQuery, with an optional Filter node and a Project on top) and sends it to the lookup node inside a FragmentExec. The lookup node then runs LocalMapper -> LookupPhysicalPlanOptimizer to produce the physical plan, reusing existing optimizer rules like PushFiltersToSource and InsertFieldExtraction. Assisted by Cursor
Physical Planning on the Lookup Node
Summary
Logical Planning on the Lookup Node will be added with a separate PR.
The change applies only to Streaming Lookup Join which is behind snapshot build flag, so no production changes due to the physical planning are expected.
Moves the physical plan construction and optimization for LOOKUP JOIN to the lookup node itself, enabling proper optimizer rules (like filter pushdown to Lucene) to run against the lookup-side plan.
Previously, the lookup node received raw request fields (match fields, extract fields, join conditions, pre-join filter plan) and manually constructed a physical plan tree (
ParameterizedQueryExec->FieldExtractExec->ProjectExec->OutputExec). This bypassed the standard optimizer pipeline, meaning right-side-only filters could only be pushed to Lucene via a special code path inExpressionQueryList.buildPreJoinFilter.Now, the data node builds a logical plan (rooted at
ParameterizedQuery, with an optionalFilternode and aProjecton top) and sends it to the lookup node inside aFragmentExec. The lookup node then runsLocalMapper->LookupPhysicalPlanOptimizerto produce the physical plan, reusing existing optimizer rules likePushFiltersToSourceandInsertFieldExtraction.Key changes
ParameterizedQuerylogical plan node: ReplacesEsRelationas the leaf of the lookup-side logical plan, carrying match field metadata and join-on conditions.LookupPhysicalPlanOptimizer: RunsPushFiltersToSource,InsertFieldExtractionandParameterizedQueryExecrules on the lookup physical plan.PushFiltersToSourceextended: Now handlesFilterExec->ParameterizedQueryExecandFilterExec->EvalExec->ParameterizedQueryExecpaths, pushing eligible filters intoParameterizedQueryExec.query(). Duplicated filter classification logic refactored into sharedclassifyFiltershelper.ExpressionQueryListenhanced: Accepts pre-separatedExpressionrightOnlyFilter andQueryBuilderpushedQuery instead of walking aPhysicalPlantree to extract filters. Old code is still there for bwc reasons with doLookup still getting called from old nodes.LookupExecutionPlannerrefactored: RemovedEnrichQuerySourceOperatorFactory(dead code after streaming migration). IntroducedQueryListFromPlanFactoryto build query lists from plan data rather than transport request fields. SimplifiedLookupDriverContextby removingrequestandqueryListFactoryfields.extractOrBuildLogicalPlandetects whether the incomingrightPreJoinPlanalready contains aParameterizedQuery(new path) or falls back to building it locally (rolling upgrade compatibility with older data nodes).esql_lookup_planning): SerializesConfigurationin the lookup transport request so the lookup node has access to fold context and query pragmas. In the future,Configurationwill also be needed for other functionality such as timezone sensitive functions.LookupFromIndexOperator.Status.emittedRows()was returningemittedPagesinstead ofemittedRows.Test plan
LookupPhysicalPlanOptimizerTestscovering: simple lookup, pushable right-only filter, non-pushable filter, mixed filters, ON expression filters, filter-through-eval pushdown, and two consecutive LOOKUP JOINs.ParameterizedQuerySerializationTestsfor serialization round-trip.LookupExecutionPlannerTests,LookupFromIndexOperatorTests,StreamingLookupFromIndexOperatorTeststo use new APIs.EsqlNodeSubclassTestsandApproximationSupportTeststo account for newParameterizedQueryandMatchConfigtypes.Assisted by Cursor