Fix column in parquet dereference pushdown#16377
Conversation
|
With this change, the check from this patch is failing, so removed for now, will update with new unit test. |
vkorukanti
left a comment
There was a problem hiding this comment.
LGTM (minor comments).
Please add a test. You can look at HiveLogicalPlanner to construct a specific plan without relying on getting SQL converted to a desired plan.
There was a problem hiding this comment.
Should we add check to make sure regularHiveColumnHandles.get(name) returns non-null?
There was a problem hiding this comment.
Added a Preconditions check
There was a problem hiding this comment.
Do we need this mapping (columnName in HiveColumnHandle -> HiveColumnHandle)? Expressions should be referring to the name in Assignments. Is that not the case in some cases?
There was a problem hiding this comment.
Sorry are you referring to the second putAll of the two here? If remove this mapping, I would get
nested column [X.Y]'s base column X is not present in table scan output
error in the immediate following logic. This places uses the subfield name to look up back to HiveColumnHandle, and now subfile has the same name as the original HiveColumnHandle, so I used the same map to do this lookup.
df155aa to
b6fec0d
Compare
b6fec0d to
b90b7c7
Compare
|
Updated the unit test from this change, I think this test case actually shows the change. Instead of having column name of |
|
@vkorukanti @zhenxiao do you mind helping merge this PR? thanks! |
This is to fix this issue.
Test plan - (Please fill in how you tested your changes)
Tested in our internal cluster. Updated unit test