Add ability to run AQE with optimizations end-to-end#19563
Add ability to run AQE with optimizations end-to-end#19563rschlussel merged 8 commits intoprestodb:masterfrom
Conversation
0ae884e to
9e86cf1
Compare
pranjalssh
left a comment
There was a problem hiding this comment.
Reviewed till commit 4
...t/java/com/facebook/presto/sql/planner/iterative/rule/TestDetermineJoinDistributionType.java
Outdated
Show resolved
Hide resolved
...st/java/com/facebook/presto/spark/adaptive/execution/TestPrestoSparkAdaptiveJoinQueries.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/JoinSwappingUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
the test has crazy union alls now because PrestoSparkRowOutputOperator has a minimum row batch target size of about 1mb, which is also about the size of a hash table for a single orders table, so I needed to make that side even bigger to be able to validate the memory used by the join operator. Additionally, with optimize_hash_generation disabled, the row batch ends up having a retained size a bit bigger than 5mb (haven't gotten to the bottom of exactly why), so I needed to make the other side of the join big enough that it would fail on the build side with a 6mb memory limit.
1e8977b to
b8eb408
Compare
|
Gah... PrestoSparkRowOutputOperator memory usage is so temperamental and so close to the memory needed for the bigger table on the build side. it's making this pair of tests impossible to write. |
There was a problem hiding this comment.
Have you checked whether hbo stats come up correctly for a remotesourcenode? It may not have a statsequivalentplannode attached
There was a problem hiding this comment.
great catch. it wasn't getting populated correctly. Didn't notice since i added it explicitly in my test.
There was a problem hiding this comment.
I put the fix for this earlier in the stack, so commit 6 is now commit 7
There was a problem hiding this comment.
Will a test like this work:
SELECT * FROM nation n JOIN (SELECT * FROM orders cross join unnest sequence(1, 50)) o ON n.nationkey = o.orderkey
There was a problem hiding this comment.
no, wouldn't help.
The PrestoSparkStatsCalculator uses runtime stats if they differ from historical stats.
Allow using stats from RemoteSourceNodes for size-based join. This is necessary for runtime optimizations.
Join build and probe sides have different requirements for the local distribution they need. However, at runtime local exchanges have already been added, so when we flip join sides, we also need to adjust local exchanges on the left and right accordingly. Extract the code that handles this from RuntimeReorderJoinSides to a common utility class so we can reuse it for this optimizer. Without this change, queries will have wrong results. The test for this is in the commit that adds the reoptimization step for AQE.
Add re-optimization step for Presto on Spark adaptive execution
Move methods shared between multiple join swapping optimizers to JoinSwappingUtils.
Test plan - added unit tests. TODO: do a verifier run
This PR adds the missing features to complete the work to incorporate the PickJoinSides optimization into AQE