Skip to content

ESQL: Refactor Join inside the planner (#115813)#116045

Merged
elasticsearchmachine merged 2 commits intoelastic:8.xfrom
costin:backport/115813
Oct 31, 2024
Merged

ESQL: Refactor Join inside the planner (#115813)#116045
elasticsearchmachine merged 2 commits intoelastic:8.xfrom
costin:backport/115813

Conversation

@costin
Copy link
Member

@costin costin commented Oct 31, 2024

First PR that introduces a Join as a first class citizen in the planner.
Previously the Join was modeled as a unary node, embedding the right
side as a local relationship inside the node but not exposed as a child.
This caused a lot the associated methods (like references, output and
inputSet) to misbehave and the physical plan rules to pick incorrect
information, such as trying to extract the local relationship fields
from the underlying source - the fix was to the local relationship
fields as ReferenceAttribute (which of course had its own set of
issues). Essentially Join was acting both as a source and as a streaming
operator.

This PR looks to partially address this by:

  • refactoring Join into a proper binary node with left and right
    branches which are used for its references and input/outputSet.
  • refactoring InlineStats to prefer composition and move the Aggregate
    on the join right branch. This reuses the Aggregate resolution out of
    the box; in the process remove the Stats interface.
  • update some of the planner rules that only worked with Unary nodes.
  • refactor Mapper into (coordinator) Mapper and LocalMapper.
  • remove Phased interface by moving its functionality inside the planner
    (no need to unpack the phased classes, the join already indicates the
    two branches needed).
  • massage the Phased execution inside EsqlSession
  • improve FieldExtractor to handle binary nodes
  • fix incorrect references in Lookup
  • generalize ProjectAwayColumns rule

Relates #112266

Not all inline and lookup tests are passing:

  • 2 lookup fields are failing due to name clashes (qualifiers should
    fix this)
  • 7 or so inline failures with a similar issue

I've disabled the tests for now to have them around once we complete
adding the functionality.

(cherry picked from commit 4ee98e8)
(cherry picked from commit 681f509)

costin and others added 2 commits October 31, 2024 10:05
First PR that introduces a Join as a first class citizen in the planner.
Previously the Join was modeled as a unary node, embedding the right
side as a local relationship inside the node but not exposed as a child.
This caused a lot the associated methods (like references, output and
inputSet) to misbehave and the physical plan rules to pick incorrect
information, such as trying to extract the local relationship fields
from the underlying source - the fix was to the local relationship
fields as ReferenceAttribute (which of course had its own set of
issues). Essentially Join was acting both as a source and as a streaming
operator.

This PR looks to partially address this by:
- refactoring Join into a proper binary node with left and right
 branches which are used for its references and input/outputSet.
- refactoring InlineStats to prefer composition and move the Aggregate
 on the join right branch. This reuses the Aggregate resolution out of
 the box; in the process remove the Stats interface.
- update some of the planner rules that only worked with Unary nodes.
- refactor Mapper into (coordinator) Mapper and LocalMapper.
- remove Phased interface by moving its functionality inside the planner
(no need to unpack the phased classes, the join already indicates the
two branches needed).
- massage the Phased execution inside EsqlSession
- improve FieldExtractor to handle binary nodes
- fix incorrect references in Lookup
- generalize ProjectAwayColumns rule

Relates elastic#112266

Not all inline and lookup tests are passing:
- 2 lookup fields are failing due to name clashes (qualifiers should
 fix this)
- 7 or so inline failures with a similar issue

I've disabled the tests for now to have them around once we complete
 adding the functionality.

(cherry picked from commit 4ee98e8)
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.17.0 labels Oct 31, 2024
@costin costin added backport :Analytics/ES|QL AKA ESQL >refactoring and removed needs:triage Requires assignment of a team area label labels Oct 31, 2024
@costin costin added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 31, 2024
@elasticsearchmachine elasticsearchmachine merged commit 8faa2a7 into elastic:8.x Oct 31, 2024
@costin costin deleted the backport/115813 branch October 31, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >refactoring v8.17.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments