feat: Add HiveTableHandle::columnHandles #15351

Yuhta · 2025-10-31T20:47:22Z

Summary:
Currently if a partitioning column is not projected out, but used in remaining filter, the engine still adds it to assignments in order for the connector to recognize it. This is a hack and we should pass the full column handles inside the table handle.

Fix #15348

Differential Revision: D85979627

netlify · 2025-10-31T20:47:28Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`0baddb4`
🔍 Latest deploy log	https://app.netlify.com/projects/meta-velox/deploys/690924a2264fe900081658ee

meta-codesync · 2025-10-31T20:47:30Z

@Yuhta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85979627.

mbasmanova

Thanks. A few questions.

mbasmanova · 2025-11-03T17:08:35Z

velox/connectors/hive/TableHandle.h

    return tableParameters_;
  }

+  // Full schema including partitioning columns and data columns.


Does this need to be full schema or only columns that are used in subfieldFilters and remainingFilter?

Ideally full schema, but in practice we only look it up for the columns appearing in filters and remaining filter expression

mbasmanova · 2025-11-03T17:08:53Z

velox/connectors/hive/TableHandle.h

      const RowTypePtr& dataColumns = nullptr,
-      const std::unordered_map<std::string, std::string>& tableParameters = {});
+      const std::unordered_map<std::string, std::string>& tableParameters = {},
+      std::vector<HiveColumnHandlePtr> columnHandles = {});


Would you document this new argument?

I added the document on the accessor method below

mbasmanova · 2025-11-03T17:10:30Z

velox/connectors/hive/HiveDataSource.h

    return emptyOutput_;
  }

+  void processColumnHandle(const HiveColumnHandlePtr& handle);


Would you add a comment to explain what this method is doing? Should we verify consistency between assignments and columnHandles? E.g. what if column x is reported as partition key in assignment and regular column in columnHandles?

Yeah that's something I have in mind but did not add it, let me add some very strict check for now since it's new

mbasmanova · 2025-11-03T17:10:59Z

velox/exec/tests/TableScanTest.cpp

+                  .startTableScan()
+                  .outputType(ROW({"x"}, {BIGINT()}))
+                  .assignments(assignments)
+                  .dataColumns(asRowType(data[0]->type()))


Why do we need both dataColumns and columnHandles?

dataColumns are needed for schema evolution purpose as well, because column handle does not tell us the position of the column in file. In this test case though we can probably skip it, but in prod usage we always supply it.

Would be nice to document all pieces of information that go into HiveTableHandle.

I will beef up the comments on the HiveTableHandle accessors

velox/exec/tests/TableScanTest.cpp

Summary: Currently if a partitioning column is not projected out, but used in remaining filter, the engine still adds it to assignments in order for the connector to recognize it. This is a hack and we should pass the full column handles inside the table handle. Fix facebookincubator#15348 Reviewed By: mbasmanova Differential Revision: D85979627

meta-codesync · 2025-11-04T02:14:53Z

This pull request has been merged in 6bb5399.

Yuhta requested a review from majetideepak as a code owner October 31, 2025 20:47

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2025

meta-codesync bot added fb-exported meta-exported labels Oct 31, 2025

mbasmanova approved these changes Nov 3, 2025

View reviewed changes

Yuhta force-pushed the export-D85979627 branch from 8703c6b to 0baddb4 Compare November 3, 2025 21:54

meta-codesync bot closed this in 6bb5399 Nov 4, 2025

facebook-github-bot added the Merged label Nov 4, 2025

feat: Add HiveTableHandle::columnHandles #15351

feat: Add HiveTableHandle::columnHandles #15351

Uh oh!

Conversation

Yuhta commented Oct 31, 2025

Uh oh!

netlify bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

meta-codesync bot commented Oct 31, 2025

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yuhta Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

meta-codesync bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Oct 31, 2025 •

edited

Loading

Yuhta Nov 3, 2025 •

edited

Loading