-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: Add HiveTableHandle::columnHandles #15351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for meta-velox canceled.
|
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. A few questions.
velox/connectors/hive/TableHandle.h
Outdated
| return tableParameters_; | ||
| } | ||
|
|
||
| // Full schema including partitioning columns and data columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be full schema or only columns that are used in subfieldFilters and remainingFilter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally full schema, but in practice we only look it up for the columns appearing in filters and remaining filter expression
| const RowTypePtr& dataColumns = nullptr, | ||
| const std::unordered_map<std::string, std::string>& tableParameters = {}); | ||
| const std::unordered_map<std::string, std::string>& tableParameters = {}, | ||
| std::vector<HiveColumnHandlePtr> columnHandles = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you document this new argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the document on the accessor method below
| return emptyOutput_; | ||
| } | ||
|
|
||
| void processColumnHandle(const HiveColumnHandlePtr& handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a comment to explain what this method is doing? Should we verify consistency between assignments and columnHandles? E.g. what if column x is reported as partition key in assignment and regular column in columnHandles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's something I have in mind but did not add it, let me add some very strict check for now since it's new
| .startTableScan() | ||
| .outputType(ROW({"x"}, {BIGINT()})) | ||
| .assignments(assignments) | ||
| .dataColumns(asRowType(data[0]->type())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both dataColumns and columnHandles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataColumns are needed for schema evolution purpose as well, because column handle does not tell us the position of the column in file. In this test case though we can probably skip it, but in prod usage we always supply it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to document all pieces of information that go into HiveTableHandle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will beef up the comments on the HiveTableHandle accessors
Summary: Currently if a partitioning column is not projected out, but used in remaining filter, the engine still adds it to assignments in order for the connector to recognize it. This is a hack and we should pass the full column handles inside the table handle. Fix facebookincubator#15348 Reviewed By: mbasmanova Differential Revision: D85979627
8703c6b to
0baddb4
Compare
|
This pull request has been merged in 6bb5399. |
Summary:
Currently if a partitioning column is not projected out, but used in remaining filter, the engine still adds it to assignments in order for the connector to recognize it. This is a hack and we should pass the full column handles inside the table handle.
Fix #15348
Differential Revision: D85979627