Skip to content

refactor: Make PlanBuilder connector agnostic part 1#14687

Closed
yingsu00 wants to merge 6 commits intofacebookincubator:mainfrom
yingsu00:connector_refactor_upstream
Closed

refactor: Make PlanBuilder connector agnostic part 1#14687
yingsu00 wants to merge 6 commits intofacebookincubator:mainfrom
yingsu00:connector_refactor_upstream

Conversation

@yingsu00
Copy link
Copy Markdown
Contributor

@yingsu00 yingsu00 commented Sep 2, 2025

What PR #14687 Does

The PR is part of the ongoing decoupling of Hive from Velox exec tests:

Before:

Many velox/exec unit tests (for TableScan, TableWriter, Exchange, etc.) directly created HiveTableHandle, HiveColumnHandle, or HivePartitionFunctionSpec. This meant the tests wouldn’t compile or run without the Hive connector being statically linked.

Now:

The PR replaces those direct Hive references with calls into the ConnectorObjectFactory and connector-agnostic interfaces.

For example, in tests that used to say:

auto handle = std::make_shared<HiveTableHandle>(...);

they now construct folly::dynamic options and call:

auto handle = factory.makeTableHandle("hive", tableName, cols, options);

Similarly, partitioning specs that used to directly instantiate HivePartitionFunctionSpec are now requested through:

factory.makePartitionFunctionSpec("hive", opts);

This ensures the exec layer and its tests depend only on the abstract connector API, not on Hive headers.

Why Exec Tests Need to Be Connector Agnostic

  • Build isolation: If every exec test pulls in Hive, any change in Hive (even trivial) will force a recompile of all exec tests.
  • Future connectors: Iceberg, Hudi, DeltaLake all need to reuse exec operators. Tests should validate operator logic, not Hive specifics.
  • Portability: Connector-agnostic tests can run even if a connector plugin isn’t present, as long as a minimal factory (real or mock) is registered.

This is part of a larger effort to make the connector plugin architecture. Whether or not to support dynamic loading of connectors is still under discussion, but making exec tests connector agnostic is a must-do item to make Velox compile with VELOX_ENABLE_HIVE_CONNECTOR=OFF. Exec tests that link against Hive types would make the build fail if VELOX_ENABLE_HIVE_CONNECTOR=OFF. This is what we have agreed on in #13698.

Steps to Make Exec Tests Connector Agnostic

  1. Introduce ConnectorObjectFactory: Define abstract factory methods: makeTableHandle, makeColumnHandle, makeInsertTableHandle, makePartitionFunctionSpec, etc. refactor: Introduce ConnectorObjectFactory and HiveObjectFactory #13798

  2. Implement connector-specific factories: Hive (and later Iceberg) provide a HiveObjectFactory implementing these methods. refactor: Introduce ConnectorObjectFactory and HiveObjectFactory #13798

  3. Refactor PlanBuilder, including TableScanBuilder, TableWriterBuilder and PartitionFunctionSpec related code in exec test utils. Replace direct Hive handle creation with factory calls. Update affected tests too. (this PR)

  4. Finish PlanBuilder refactor to remove Tpch and Tpcds specific code. Can be done as a new beginner task.

  5. Refactor HiveConnectorTestBase, make it connector agnostic. We can potentially rename it as ConnectorTestBase and move it back to connectors/common/tests. Hive tests, Iceberg tests or other connector-specific tests can also inherit it.

Depends on #13798

@netlify
Copy link
Copy Markdown

netlify bot commented Sep 2, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 8ff964a
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/68c7db1cee1a150008472c6e

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2025
@yingsu00 yingsu00 force-pushed the connector_refactor_upstream branch 7 times, most recently from 9006496 to 5e5bdea Compare September 9, 2025 08:05
@yingsu00 yingsu00 force-pushed the connector_refactor_upstream branch from 5e5bdea to c356ea9 Compare September 10, 2025 06:40
In order to decouple Hive and other specific connectors from exec and
core modules, we need to put the connector names in a central location
velox/connectors/ConnectorNames.h. The idea is similar to
facebook::velox::dwio::common::FileFormat where all file formats are
specified. Modules outside of the connectors module can just reference
this central and connector-agnostic header instead of connector
specific headers like HiveConnector.h.
This commit is the first part of the effort to decouple Hive from exec
tests, which aims to make VELOX_ENABLE_HIVE_CONNECTOR=OFF build without
errors. The content of this commit include:

- Enhance velox::connector::ConnectorFactory, adding abstract methods
  for creating ConnectoSplits, TableHandles, InsertTableHandles,etc.
- Enhance HiveConnectorFactory in velox/connectors/hive that implements the
  newly added function interface.
- Add a new HiveObjectFactoryTest suite to verify that dynamic options
  yield correct Hive-specific objects without leaking connector internals
  into core or exec tests.
PartitionFunction and PartitionFunctionSpec are core concepts used in
various places like exec and connectors. To make exec PlanBuilder test
utilility class connector agnostic, this commit moves them out of PlanNode.h
to a standalone file in the same folder, making it a first class interface
in velox/core.
PlanBuilder is a core test util in exec and should be connector agnostic.
This commits makes the generic TableScan, TableWriter, and
PartitionFunctionSpec connector agnostic. The next step would be removing
Tpch and Tpcds spsecific code from PlanBuilder.
folly::Executor* ioExecutor = nullptr,
folly::Executor* cpuExecutor = nullptr) = 0;

virtual std::shared_ptr<ConnectorSplit> makeConnectorSplit(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 Connector is a generic API, but these new APIs are specific to Hive-like connectors. For example, these do not make sense for MySQL connector.

@yingsu00
Copy link
Copy Markdown
Contributor Author

Closing for now. Will open new PRs that introduce Iceberg standalone connector by Masha's suggestion.

@yingsu00 yingsu00 closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants