ESQL: Refactor Join inside the planner (#115813)#116045
Merged
elasticsearchmachine merged 2 commits intoelastic:8.xfrom Oct 31, 2024
Merged
ESQL: Refactor Join inside the planner (#115813)#116045elasticsearchmachine merged 2 commits intoelastic:8.xfrom
elasticsearchmachine merged 2 commits intoelastic:8.xfrom
Conversation
First PR that introduces a Join as a first class citizen in the planner. Previously the Join was modeled as a unary node, embedding the right side as a local relationship inside the node but not exposed as a child. This caused a lot the associated methods (like references, output and inputSet) to misbehave and the physical plan rules to pick incorrect information, such as trying to extract the local relationship fields from the underlying source - the fix was to the local relationship fields as ReferenceAttribute (which of course had its own set of issues). Essentially Join was acting both as a source and as a streaming operator. This PR looks to partially address this by: - refactoring Join into a proper binary node with left and right branches which are used for its references and input/outputSet. - refactoring InlineStats to prefer composition and move the Aggregate on the join right branch. This reuses the Aggregate resolution out of the box; in the process remove the Stats interface. - update some of the planner rules that only worked with Unary nodes. - refactor Mapper into (coordinator) Mapper and LocalMapper. - remove Phased interface by moving its functionality inside the planner (no need to unpack the phased classes, the join already indicates the two branches needed). - massage the Phased execution inside EsqlSession - improve FieldExtractor to handle binary nodes - fix incorrect references in Lookup - generalize ProjectAwayColumns rule Relates elastic#112266 Not all inline and lookup tests are passing: - 2 lookup fields are failing due to name clashes (qualifiers should fix this) - 7 or so inline failures with a similar issue I've disabled the tests for now to have them around once we complete adding the functionality. (cherry picked from commit 4ee98e8)
(cherry picked from commit 681f509)
This was referenced Oct 31, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First PR that introduces a Join as a first class citizen in the planner.
Previously the Join was modeled as a unary node, embedding the right
side as a local relationship inside the node but not exposed as a child.
This caused a lot the associated methods (like references, output and
inputSet) to misbehave and the physical plan rules to pick incorrect
information, such as trying to extract the local relationship fields
from the underlying source - the fix was to the local relationship
fields as ReferenceAttribute (which of course had its own set of
issues). Essentially Join was acting both as a source and as a streaming
operator.
This PR looks to partially address this by:
branches which are used for its references and input/outputSet.
on the join right branch. This reuses the Aggregate resolution out of
the box; in the process remove the Stats interface.
(no need to unpack the phased classes, the join already indicates the
two branches needed).
Relates #112266
Not all inline and lookup tests are passing:
fix this)
I've disabled the tests for now to have them around once we complete
adding the functionality.
(cherry picked from commit 4ee98e8)
(cherry picked from commit 681f509)