ES|QL: Late materialization golden tests#141082
Conversation
alex-spies
left a comment
There was a problem hiding this comment.
Thanks @GalLalouche !
First pass, focused on the golden testing itself. This looks alright with minor comments, only.
Will do a second pass to have a look at the actual expectations.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlannerGoldenTests.java
Show resolved
Hide resolved
| @@ -0,0 +1,7 @@ | |||
| ProjectExec[[hire_date{f}, salary{f}, emp_no{f}]] | |||
| \_TopNExec[[Order[hire_date{f},ASC,LAST]],20[INTEGER],null] | |||
There was a problem hiding this comment.
For follow-up: I see that we strip the name ids from the expectations.
Can we, instead, normalize the name ids but still assert them?
In many tests, the exact name id makes a major difference, and asserting them without golden tests is super painful. Tests for the unmapped fields feature could be drastically simplified with golden tests, for instance, but we must look at the name ids.
If you agree, I'd fold this test enhancement into #138888.
There was a problem hiding this comment.
I suppose this would be achievable using one of our Node transformations. I agree, let's do this as a follow up.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
alex-spies
left a comment
There was a problem hiding this comment.
This is super cool, thanks @GalLalouche !
I have mainly minor remarks related to how easy it is to review and understand the test expectations.
There is a non-minor point, namely the reduction_local_node_reduce.expected expectations don't seem to make sense to me; I don't think the reduce plan undergoes optimization in Production code - but if it does, we have a problem :) Could you please double check this? I left a related remark below.
...ializationPlannerGoldenTests/testBasicTopNLateMaterialization/physical_optimization.expected
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlannerGoldenTests.java
Show resolved
Hide resolved
...s/LateMaterializationPlannerGoldenTests/testExpressionSortTopN/reduction_local_data.expected
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,4 @@ | |||
| ExchangeSinkExec[[emp_no{f}, height{f}],false] | |||
| \_ProjectExec[[_doc{f}, height{f}]] | |||
| \_FieldExtractExec[height{f}]<[],[]> | |||
There was a problem hiding this comment.
Yeah, this isn't ideal. We could extract only in the reduce driver. An optimization for the future, maybe.
...src/test/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlannerGoldenTests.java
Show resolved
Hide resolved
...src/test/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlannerGoldenTests.java
Show resolved
Hide resolved
| \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> | ||
| Project[[_doc{f}, emp_no{f}, languages{f}, language_code{r}, language_name{f}]] | ||
| \_TopN[[Order[emp_no{f},ASC,LAST]],20[INTEGER],false] | ||
| \_Join[LEFT,[language_code{r}],[language_code{f}],null] |
There was a problem hiding this comment.
Oooh, there's an interesting optimization opportunity here!
We could easily perform the lookup join in the reduce driver or even the coordinator, that would also be a form of late materialization - and could reduce the work on the lookup node by 1/n, where n is the number of data drivers running on the node. If the lookup shard isn't replicated, this could affect the overall latency, and if the lookup node is somehow under pressure (e.g. from concurrent query requests that involve joins), even more so.
@julian-elastic, I think @smalyshev noticed this as well (it is related to remote enriches); should we open a tracking issue for an optimization where we try to perform TopNs before joins? (That only works when the lookup isn't remote/CCS, but still).
There was a problem hiding this comment.
This is very nice, thanks @GalLalouche !
I have mostly minor comments; feel free to address those at your own discretion and
!
A non-minor comment is that we're still performing local optimization of reduce plans in cases that don't need that. Actually, I can't think of a case where we should perform or gain anything out of local optimization of reduce plans. Since that affects already live production code, though, we can clean that up in a follow-up - in which case I'd like to ask that we create a tracking issue, please! (Even though that should be a tiny fix.) See below for more details.
| NODE_REDUCE(new DualFileOutput("local_reduce_planned_reduce_driver", "local_reduce_planned_data_driver")), | ||
| /** | ||
| * A combination of {@link Stage#NODE_REDUCE} and {@link Stage#LOCAL_PHYSICAL_OPTIMIZATION}: first produce the node | ||
| * reduce and data node plans, and then perform local physical optimization on both. |
There was a problem hiding this comment.
I was a bit confused, because my understanding is that node reduce plans shouldn't ever be optimized (at least for now). I learned that we do run the optimizer, even though I don't think it does anything on node reduce plans right now :)
I left a suggestion (for production code changes) below. If we plan to address that in a follow-up, maybe let's leave a TODO comment.
In terms of testing, I think only 1 plan is interesting here: the optimized physical data driver plan. The node reduce plan shouldn't change in this stage from what we got at just the NODE_REDUCE step.
| var foo = localOptimize(reductionPlan.dataNodePlan(), conf); | ||
| var bar = verifyOrWrite(foo, outputPath(dualFileOutput.dataNodeOutput())); |
There was a problem hiding this comment.
Var naming can probably be more specific than foo and bar :D
There was a problem hiding this comment.
😅
Inlined varfiable is best variable.
...l/src/main/java/org/elasticsearch/xpack/esql/plugin/NodeReduceLocalPhysicalOptimization.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ClusterComputeHandler.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ReductionPlan.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java
Outdated
Show resolved
Hide resolved
| @@ -262,6 +262,8 @@ public static PhysicalPlan localPlan( | |||
| return EstimatesRowSize.estimateRowSize(f.estimatedRowSize(), localOptimized); | |||
| }); | |||
|
|
|||
| // TODO add a test assertion for the consistency checker (see | |||
There was a problem hiding this comment.
Hmm, I wonder if this is the right place for it.
Above, we only update the fragment - and run 2 optimizer passes on it, which both perform a consistency check at the end.
We're being inconsistent with the exchange that's downstream from the fragment; but that must've been the case already before we reached here - the localPlan method must've gotten an inconsistent plan to begin with!
So, maybe it's useful to add a check at the beginning of this method? (And only run it if assertions are enabled.) A check down here isn't bad, either, but may make it look like the optimizer passes above had any chance of messing up the plan, but they don't really.
There was a problem hiding this comment.
Moved to the beginning.
...src/test/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlannerGoldenTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java
Outdated
Show resolved
Hide resolved
|
Thanks for the in-depth review, @alex-spies! |
This PR adds golden tests for node-reduce late materialization of TopN, so we can remove its snapshot hiding. To support this, I also added support for two new stages in golden tests: node_reduce, and local_node_reduce, record the pair output of PlannerUtils.reductionPlan, and the same pair under local optimization.
elastic#142834) Just what it says on the tin. Follow-up to elastic#141082 and elastic#132757.
This PR adds golden tests for node-reduce late materialization of TopN, so we can remove its snapshot hiding.
To support this, I also added support for two new stages in golden tests:
node_reduce, andlocal_node_reduce, record the pair output ofPlannerUtils.reductionPlan, and the same pair under local optimization.