Skip to content

Conversation

@Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Oct 31, 2025

Summary:
Currently if a partitioning column is not projected out, but used in remaining filter, the engine still adds it to assignments in order for the connector to recognize it. This is a hack and we should pass the full column handles inside the table handle.

Fix #15348

Differential Revision: D85979627

@Yuhta Yuhta requested a review from majetideepak as a code owner October 31, 2025 20:47
@netlify
Copy link

netlify bot commented Oct 31, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0baddb4
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/690924a2264fe900081658ee

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2025
@meta-codesync
Copy link

meta-codesync bot commented Oct 31, 2025

@Yuhta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85979627.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A few questions.

return tableParameters_;
}

// Full schema including partitioning columns and data columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be full schema or only columns that are used in subfieldFilters and remainingFilter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally full schema, but in practice we only look it up for the columns appearing in filters and remaining filter expression

const RowTypePtr& dataColumns = nullptr,
const std::unordered_map<std::string, std::string>& tableParameters = {});
const std::unordered_map<std::string, std::string>& tableParameters = {},
std::vector<HiveColumnHandlePtr> columnHandles = {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you document this new argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the document on the accessor method below

return emptyOutput_;
}

void processColumnHandle(const HiveColumnHandlePtr& handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add a comment to explain what this method is doing? Should we verify consistency between assignments and columnHandles? E.g. what if column x is reported as partition key in assignment and regular column in columnHandles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's something I have in mind but did not add it, let me add some very strict check for now since it's new

.startTableScan()
.outputType(ROW({"x"}, {BIGINT()}))
.assignments(assignments)
.dataColumns(asRowType(data[0]->type()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both dataColumns and columnHandles?

Copy link
Contributor Author

@Yuhta Yuhta Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataColumns are needed for schema evolution purpose as well, because column handle does not tell us the position of the column in file. In this test case though we can probably skip it, but in prod usage we always supply it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to document all pieces of information that go into HiveTableHandle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will beef up the comments on the HiveTableHandle accessors

Summary:

Currently if a partitioning column is not projected out, but used in remaining filter, the engine still adds it to assignments in order for the connector to recognize it.  This is a hack and we should pass the full column handles inside the table handle.

Fix facebookincubator#15348

Reviewed By: mbasmanova

Differential Revision: D85979627
@meta-codesync meta-codesync bot closed this in 6bb5399 Nov 4, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 4, 2025

This pull request has been merged in 6bb5399.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HiveTableHandle is missing handles for columns used in pushed down filters

3 participants