Skip to content

[refactor] Move HiveConnectorSplitBuilder to HiveConnectorSplit#484

Merged
ZacBlanco merged 1 commit intobytedance:mainfrom
yingsu00:connector_refactor_2.1
Apr 9, 2026
Merged

[refactor] Move HiveConnectorSplitBuilder to HiveConnectorSplit#484
ZacBlanco merged 1 commit intobytedance:mainfrom
yingsu00:connector_refactor_2.1

Conversation

@yingsu00
Copy link
Copy Markdown
Contributor

@yingsu00 yingsu00 commented Apr 4, 2026

What problem does this PR solve?

In the upcoming refactor to decouple Hive from exec tests, we will need
to make HiveConnectorTestBase connector agnostic. However, it contains
HiveConnectorSplitBuilder, and this is an obstacle.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

The HiveConnectorSplitBuilder should go back to the Hive connector
in connectors/hive/HiveConnectorSplit.cpp. This commit makes this move
and updates HiveObjectFactory to use it to create Hive splits.

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:
N/A

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@yingsu00 yingsu00 force-pushed the connector_refactor_2.1 branch 2 times, most recently from 0b60cfd to c710aba Compare April 5, 2026 12:52
@yingsu00 yingsu00 changed the title [refactor] Add HiveConnectorSplitBuilder and use it in HiveObjectFactory [refactor] Move HiveConnectorSplitBuilder to HiveConnectorSplit Apr 5, 2026
@yingsu00
Copy link
Copy Markdown
Contributor Author

yingsu00 commented Apr 6, 2026

@ZacBlanco Can you please review this PR? Thanks!

Copy link
Copy Markdown
Collaborator

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change overall LGTM. One other thing though - I know the CI doesn't build all configurations now, but I think there might be some more tests which use the old HiveSplitBuilder in modules that we don't run in CI - for example, the GCS and S3 storage tests. Can you check for usages there to see if they also need to be updated?

You can build locally with something like

make <target> CONAN_OPTIONS=" -o bolt/*:enable_gcs=True -o bolt/*:enable_s3=True "

I'm going to add an item to my backlog to enable some more of these options in CI, but I'm not sure when I'll get around to it.

@yingsu00
Copy link
Copy Markdown
Contributor Author

yingsu00 commented Apr 8, 2026

Change overall LGTM. One other thing though - I know the CI doesn't build all configurations now, but I think there might be some more tests which use the old HiveSplitBuilder in modules that we don't run in CI - for example, the GCS and S3 storage tests. Can you check for usages there to see if they also need to be updated?

You can build locally with something like

make <target> CONAN_OPTIONS=" -o bolt/*:enable_gcs=True -o bolt/*:enable_s3=True "

I'm going to add an item to my backlog to enable some more of these options in CI, but I'm not sure when I'll get around to it.

@ZacBlanco bolt/exec/tests/utils/HiveConnectorTestBase.cpp is using the moved HiveConnectorSplitBuilder by:

using connector::hive::HiveConnectorSplitBuilder;

Same for bolt/exec/tests/HashJoinTest.cpp. In the next PR I will remove the direct reference to any Hive objects in HiveConnectorTestBase.cpp, but I believe it's ok to do so in this PR.

I verified bolt_exec_test builds fine and tests using it run fine too:

/Users/yingsu/repo/bytedance1/bolt/_build/debug/bolt/exec/tests/bolt_exec_scan_test --gtest_filter=TableScanTest.allColumns:TableScanTest/*.allColumns:TableScanTest.allColumns/*:*/TableScanTest.allColumns/*:*/TableScanTest/*.allColumns --gtest_color=no
Testing started at 23:21 ...
I0407 23:21:51.900535 23769091 Compression.cpp:643] Initialized zstd compressor with compression level 7
Process finished with exit code 0

/Users/yingsu/repo/bytedance1/bolt/_build/debug/bolt/exec/tests/bolt_exec_infra_test --gtest_filter=AssertQueryBuilderTest.basic:AssertQueryBuilderTest/*.basic:AssertQueryBuilderTest.basic/*:*/AssertQueryBuilderTest.basic/*:*/AssertQueryBuilderTest/*.basic --gtest_color=no
Testing started at 23:19 ...
Process finished with exit code 0

/Users/yingsu/repo/bytedance1/bolt/_build/debug/bolt/exec/tests/bolt_exec_hash_join_test --gtest_filter=MultiThreadedHashJoinTest.bigintArray:MultiThreadedHashJoinTest/*.bigintArray:MultiThreadedHashJoinTest.bigintArray/*:*/MultiThreadedHashJoinTest.bigintArray/*:*/MultiThreadedHashJoinTest/*.bigintArray --gtest_color=no
Testing started at 23:24 ...
Process finished with exit code 0

The CI would fail if the tests weren't able to find HiveConnectorSplitBuilder. It's nice to have more coverage in the CI, but for the concern you had I think the CI already covers it.

@ZacBlanco ZacBlanco enabled auto-merge April 8, 2026 16:27
auto-merge was automatically disabled April 9, 2026 09:35

Head branch was pushed to by a user without write access

@yingsu00 yingsu00 force-pushed the connector_refactor_2.1 branch 2 times, most recently from bdb8e62 to 5e2a87e Compare April 9, 2026 09:37
In the upcoming refactor to decouple Hive from exec tests, we will need
to make HiveConnectorTestBase connector agnostic. However, it contains
HiveConnectorSplitBuilder, which should go back to the Hive connector
in connectors/hive/HiveConnectorSplit.cpp. This commit makes this move
and updates HiveObjectFactory to use it to create Hive splits.
@yingsu00 yingsu00 force-pushed the connector_refactor_2.1 branch from 5e2a87e to c02d602 Compare April 9, 2026 15:40
@ZacBlanco ZacBlanco added this pull request to the merge queue Apr 9, 2026
Merged via the queue into bytedance:main with commit fbf8dd4 Apr 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants