Add an option to have HBO return output size with individual variable size#25400
Add an option to have HBO return output size with individual variable size#25400feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
a326b45 to
f046dee
Compare
|
If outputsize is incorrect why won't outputRowCount be incorrect? |
OutputRowCount is correct. Only the output size can be inaccurate. It's due to multiple reasons in my understanding, for example encoding of data where you do not know the size after decoding, late materialization where at this stage no knowledge of data size since it's not loaded yet etc. And that's why I think having an option to use the variable sizes with accurate row count may be a better option. |
|
|
Opened #25447 for the documentation of the new session property. |
Description
For history based optimizer (HBO), when getting the size of output of a plan node, it will use the recorded output size in history, i.e.
estimateSizeUsingVariables()returns false for HBOpresto/presto-main-base/src/main/java/com/facebook/presto/cost/PlanNodeStatsEstimate.java
Lines 172 to 178 in 36feaa1
However, this can be a problem for Prestissimo. The reported output size of a plan node is a best estimation and depending on a lot of factors, e.g. data late materialization, decode/encode etc.
For example, according to @kevinwilfong, in hash join, data will be materialized for hash build side, while for probe side, data may or may not be materialized depending on batch size. And data not materialized does not count when reporting output size. This can lead to a problem, for example, the probe side input will report a smaller size if not materialized. However, when later the join is reordered that the probe side is switched to build side, data will be materialized, the output size will be larger than history records, hence lead to query oom.
In this PR, I added an option to HBO to use individual variable size to estimate the plan output size. This is also what CBO is using.
Motivation and Context
To make HBO work better with Prestissimo
Impact
To make HBO work better with Prestissimo
Test Plan
Unit test
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.