Skip to content

ESQL: Physical Planning on the Lookup Node#143707

Merged
julian-elastic merged 14 commits intoelastic:mainfrom
julian-elastic:lookupPlanning_v3
Mar 13, 2026
Merged

ESQL: Physical Planning on the Lookup Node#143707
julian-elastic merged 14 commits intoelastic:mainfrom
julian-elastic:lookupPlanning_v3

Conversation

@julian-elastic
Copy link
Copy Markdown
Contributor

@julian-elastic julian-elastic commented Mar 5, 2026

Physical Planning on the Lookup Node

Summary

Logical Planning on the Lookup Node will be added with a separate PR.

The change applies only to Streaming Lookup Join which is behind snapshot build flag, so no production changes due to the physical planning are expected.

Moves the physical plan construction and optimization for LOOKUP JOIN to the lookup node itself, enabling proper optimizer rules (like filter pushdown to Lucene) to run against the lookup-side plan.

Previously, the lookup node received raw request fields (match fields, extract fields, join conditions, pre-join filter plan) and manually constructed a physical plan tree (ParameterizedQueryExec -> FieldExtractExec -> ProjectExec -> OutputExec). This bypassed the standard optimizer pipeline, meaning right-side-only filters could only be pushed to Lucene via a special code path in ExpressionQueryList.buildPreJoinFilter.

Now, the data node builds a logical plan (rooted at ParameterizedQuery, with an optional Filter node and a Project on top) and sends it to the lookup node inside a FragmentExec. The lookup node then runs LocalMapper -> LookupPhysicalPlanOptimizer to produce the physical plan, reusing existing optimizer rules like PushFiltersToSource and InsertFieldExtraction.

Key changes

  • Refactored doLookupStreaming : Remove usages of the use of Request after logical plan is created. The idea is that all the information needed should be already in the plan, and we should not read anything from the request as some of the rules could have modified it.
  • New ParameterizedQuery logical plan node: Replaces EsRelation as the leaf of the lookup-side logical plan, carrying match field metadata and join-on conditions.
  • New LookupPhysicalPlanOptimizer: Runs PushFiltersToSource, InsertFieldExtraction and ParameterizedQueryExec rules on the lookup physical plan.
  • PushFiltersToSource extended: Now handles FilterExec -> ParameterizedQueryExec and FilterExec -> EvalExec -> ParameterizedQueryExec paths, pushing eligible filters into ParameterizedQueryExec.query(). Duplicated filter classification logic refactored into shared classifyFilters helper.
  • ExpressionQueryList enhanced: Accepts pre-separated Expression rightOnlyFilter and QueryBuilder pushedQuery instead of walking a PhysicalPlan tree to extract filters. Old code is still there for bwc reasons with doLookup still getting called from old nodes.
  • LookupExecutionPlanner refactored: Removed EnrichQuerySourceOperatorFactory (dead code after streaming migration). Introduced QueryListFromPlanFactory to build query lists from plan data rather than transport request fields. Simplified LookupDriverContext by removing request and queryListFactory fields.
  • BWC path: extractOrBuildLogicalPlan detects whether the incoming rightPreJoinPlan already contains a ParameterizedQuery (new path) or falls back to building it locally (rolling upgrade compatibility with older data nodes).
  • New transport version (esql_lookup_planning): Serializes Configuration in the lookup transport request so the lookup node has access to fold context and query pragmas. In the future, Configuration will also be needed for other functionality such as timezone sensitive functions.
  • Bug fix: LookupFromIndexOperator.Status.emittedRows() was returning emittedPages instead of emittedRows.

Test plan

  • New LookupPhysicalPlanOptimizerTests covering: simple lookup, pushable right-only filter, non-pushable filter, mixed filters, ON expression filters, filter-through-eval pushdown, and two consecutive LOOKUP JOINs.
  • New ParameterizedQuerySerializationTests for serialization round-trip.
  • Updated LookupExecutionPlannerTests, LookupFromIndexOperatorTests, StreamingLookupFromIndexOperatorTests to use new APIs.
  • Updated EsqlNodeSubclassTests and ApproximationSupportTests to account for new ParameterizedQuery and MatchConfig types.

Assisted by Cursor

@julian-elastic julian-elastic force-pushed the lookupPlanning_v3 branch 2 times, most recently from 2270140 to 70686e0 Compare March 5, 2026 22:18
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)
  • Team:Delivery
  • Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: f8e72262-eb15-4ed4-93fe-bdd90077aa6a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@julian-elastic julian-elastic self-assigned this Mar 6, 2026
@julian-elastic julian-elastic added :Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech-debt labels Mar 6, 2026
@julian-elastic julian-elastic marked this pull request as ready for review March 6, 2026 20:18
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@julian-elastic julian-elastic merged commit 0db6ce9 into elastic:main Mar 13, 2026
36 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 13, 2026
…elocations

* upstream/main: (72 commits)
  [Test] Randomly disable sequence numbers in CcrTimeSeriesDataStreamsIT (elastic#143930)
  Fix AsyncSearchIndexServiceTests.testCircuitBreaker failure (elastic#144058)
  Refine GenerativeIT some more, this time with accounting for some added (elastic#144220)
  ESQL: Physical Planning on the Lookup Node (elastic#143707)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats by with zero variance} elastic#144240
  Trigger counter metrics in test for delta temporality measurements (elastic#144193)
  fix capabiltiy approximation_v3 (elastic#144230)
  [ci] Add PR pipeline for testing ipv6 and fix tests not working with ipv6 (elastic#140473)
  update (elastic#144095)
  Make from/to optional in TBUCKET when Kibana timestamp filter is present (elastic#144057)
  Extract reroute behavior from create-index request classes (elastic#144140)
  ESQL: Fix release build only failures (elastic#144122)
  ES|QL query approximation: move sample correction to data node (elastic#144005)
  Add indexing pressure tracking to OTLP endpoints (elastic#144009)
  Fix replica writes after _seq_no doc values are pruned (elastic#144180)
  allow tests to configure supportsLoadingConfig (elastic#144061)
  [ES|QL] Unmute testGiantTextFieldInSubqueryIntermediateResultsWithSort (elastic#144126)
  [ESQL][DOCS] Add CPS page (unpublished for moment) (elastic#144206)
  ESQL: Forbid "load" unmapped_fields for certain commands (elastic#144115)
  Add CCS Remote Views Detection (elastic#143384)
  ...
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Mar 16, 2026
Moves the physical plan construction and optimization for LOOKUP JOIN to the lookup node itself, enabling proper optimizer rules (like filter pushdown to Lucene) to run against the lookup-side plan.

Previously, the lookup node received raw request fields (match fields, extract fields, join conditions, pre-join filter plan) and manually constructed a physical plan tree (ParameterizedQueryExec -> FieldExtractExec -> ProjectExec -> OutputExec). This bypassed the standard optimizer pipeline, meaning right-side-only filters could only be pushed to Lucene via a special code path in ExpressionQueryList.buildPreJoinFilter.

Now, the data node builds a logical plan (rooted at ParameterizedQuery, with an optional Filter node and a Project on top) and sends it to the lookup node inside a FragmentExec. The lookup node then runs LocalMapper -> LookupPhysicalPlanOptimizer to produce the physical plan, reusing existing optimizer rules like PushFiltersToSource and InsertFieldExtraction.

Assisted by Cursor
* Physical plan optimizer for the lookup node. Mirrors {@link LocalPhysicalPlanOptimizer} but with a
* reduced rule set appropriate for lookup plans (rooted at ParameterizedQueryExec, not EsSourceExec).
*/
public class LookupPhysicalPlanOptimizer extends ParameterizedRuleExecutor<PhysicalPlan, LocalPhysicalOptimizerContext> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if re-using the LocalPhysicalOptimizerContext is appropriate. It has stuff in it that's not really relevant for non-data node plans.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PushFiltersToSource, InsertFieldExtraction rules are parameterized on LocalPhysicalOptimizerContext. I think it should be ok to create LocalPhysicalOptimizerContext here, so we can avoid duplicating the rules or having a complex rule hierarchy.
public class PushFiltersToSource extends PhysicalOptimizerRules.ParameterizedOptimizerRule<FilterExec, LocalPhysicalOptimizerContext> {

michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
Moves the physical plan construction and optimization for LOOKUP JOIN to the lookup node itself, enabling proper optimizer rules (like filter pushdown to Lucene) to run against the lookup-side plan.

Previously, the lookup node received raw request fields (match fields, extract fields, join conditions, pre-join filter plan) and manually constructed a physical plan tree (ParameterizedQueryExec -> FieldExtractExec -> ProjectExec -> OutputExec). This bypassed the standard optimizer pipeline, meaning right-side-only filters could only be pushed to Lucene via a special code path in ExpressionQueryList.buildPreJoinFilter.

Now, the data node builds a logical plan (rooted at ParameterizedQuery, with an optional Filter node and a Project on top) and sends it to the lookup node inside a FragmentExec. The lookup node then runs LocalMapper -> LookupPhysicalPlanOptimizer to produce the physical plan, reusing existing optimizer rules like PushFiltersToSource and InsertFieldExtraction.

Assisted by Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech-debt v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants