refactor: Make PlanBuilder connector agnostic part 1#14687
Closed
yingsu00 wants to merge 6 commits intofacebookincubator:mainfrom
Closed
refactor: Make PlanBuilder connector agnostic part 1#14687yingsu00 wants to merge 6 commits intofacebookincubator:mainfrom
yingsu00 wants to merge 6 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
9006496 to
5e5bdea
Compare
5e5bdea to
c356ea9
Compare
In order to decouple Hive and other specific connectors from exec and core modules, we need to put the connector names in a central location velox/connectors/ConnectorNames.h. The idea is similar to facebook::velox::dwio::common::FileFormat where all file formats are specified. Modules outside of the connectors module can just reference this central and connector-agnostic header instead of connector specific headers like HiveConnector.h.
c356ea9 to
c613894
Compare
This commit is the first part of the effort to decouple Hive from exec tests, which aims to make VELOX_ENABLE_HIVE_CONNECTOR=OFF build without errors. The content of this commit include: - Enhance velox::connector::ConnectorFactory, adding abstract methods for creating ConnectoSplits, TableHandles, InsertTableHandles,etc. - Enhance HiveConnectorFactory in velox/connectors/hive that implements the newly added function interface. - Add a new HiveObjectFactoryTest suite to verify that dynamic options yield correct Hive-specific objects without leaking connector internals into core or exec tests.
PartitionFunction and PartitionFunctionSpec are core concepts used in various places like exec and connectors. To make exec PlanBuilder test utilility class connector agnostic, this commit moves them out of PlanNode.h to a standalone file in the same folder, making it a first class interface in velox/core.
PlanBuilder is a core test util in exec and should be connector agnostic. This commits makes the generic TableScan, TableWriter, and PartitionFunctionSpec connector agnostic. The next step would be removing Tpch and Tpcds spsecific code from PlanBuilder.
c613894 to
8ff964a
Compare
mbasmanova
reviewed
Sep 15, 2025
| folly::Executor* ioExecutor = nullptr, | ||
| folly::Executor* cpuExecutor = nullptr) = 0; | ||
|
|
||
| virtual std::shared_ptr<ConnectorSplit> makeConnectorSplit( |
Contributor
There was a problem hiding this comment.
@yingsu00 Connector is a generic API, but these new APIs are specific to Hive-like connectors. For example, these do not make sense for MySQL connector.
Contributor
Author
|
Closing for now. Will open new PRs that introduce Iceberg standalone connector by Masha's suggestion. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What PR #14687 Does
The PR is part of the ongoing decoupling of Hive from Velox exec tests:
Before:
Many velox/exec unit tests (for TableScan, TableWriter, Exchange, etc.) directly created HiveTableHandle, HiveColumnHandle, or HivePartitionFunctionSpec. This meant the tests wouldn’t compile or run without the Hive connector being statically linked.
Now:
The PR replaces those direct Hive references with calls into the ConnectorObjectFactory and connector-agnostic interfaces.
For example, in tests that used to say:
auto handle = std::make_shared<HiveTableHandle>(...);they now construct folly::dynamic options and call:
Similarly, partitioning specs that used to directly instantiate
HivePartitionFunctionSpecare now requested through:factory.makePartitionFunctionSpec("hive", opts);This ensures the exec layer and its tests depend only on the abstract connector API, not on Hive headers.
Why Exec Tests Need to Be Connector Agnostic
This is part of a larger effort to make the connector plugin architecture. Whether or not to support dynamic loading of connectors is still under discussion, but making exec tests connector agnostic is a must-do item to make Velox compile with VELOX_ENABLE_HIVE_CONNECTOR=OFF. Exec tests that link against Hive types would make the build fail if VELOX_ENABLE_HIVE_CONNECTOR=OFF. This is what we have agreed on in #13698.
Steps to Make Exec Tests Connector Agnostic
Introduce ConnectorObjectFactory: Define abstract factory methods: makeTableHandle, makeColumnHandle, makeInsertTableHandle, makePartitionFunctionSpec, etc. refactor: Introduce ConnectorObjectFactory and HiveObjectFactory #13798
Implement connector-specific factories: Hive (and later Iceberg) provide a HiveObjectFactory implementing these methods. refactor: Introduce ConnectorObjectFactory and HiveObjectFactory #13798
Refactor PlanBuilder, including TableScanBuilder, TableWriterBuilder and PartitionFunctionSpec related code in exec test utils. Replace direct Hive handle creation with factory calls. Update affected tests too. (this PR)
Finish PlanBuilder refactor to remove Tpch and Tpcds specific code. Can be done as a new beginner task.
Refactor HiveConnectorTestBase, make it connector agnostic. We can potentially rename it as ConnectorTestBase and move it back to connectors/common/tests. Hive tests, Iceberg tests or other connector-specific tests can also inherit it.
Depends on #13798