Skip to content

misc(native): Replace HiveConnectorFactory with IcebergConnectorFactory in iceberg connector#26661

Merged
PingLiuPing merged 2 commits intoprestodb:masterfrom
PingLiuPing:lp_iceberg_connector
Dec 31, 2025
Merged

misc(native): Replace HiveConnectorFactory with IcebergConnectorFactory in iceberg connector#26661
PingLiuPing merged 2 commits intoprestodb:masterfrom
PingLiuPing:lp_iceberg_connector

Conversation

@PingLiuPing
Copy link
Copy Markdown
Contributor

@PingLiuPing PingLiuPing commented Nov 20, 2025

Description

Use iceberg connector factory for iceberg connector.
Change Iceberg NativeQueryRunner catalog property.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 20, 2025
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Nov 20, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Registers the dedicated Iceberg connector factory in native execution and updates native query runner test catalog configuration to use the Iceberg connector name for the iceberg catalog instead of always using the Hive connector name.

Sequence diagram for iceberg connector usage in native execution

sequenceDiagram
    actor User
    participant Coordinator
    participant NativeWorker
    participant ConnectorRegistry
    participant IcebergFactory
    participant IcebergConnector

    Note over NativeWorker,ConnectorRegistry: Startup phase
    NativeWorker->>ConnectorRegistry: registerConnectorFactories
    NativeWorker->>ConnectorRegistry: register IcebergConnectorFactory
    ConnectorRegistry->>IcebergFactory: store factory for connector_name iceberg

    Note over User,IcebergConnector: Query execution
    User->>Coordinator: submit query using catalog iceberg
    Coordinator->>NativeWorker: schedule native task for catalog iceberg
    NativeWorker->>ConnectorRegistry: getFactory(connector_name iceberg)
    ConnectorRegistry-->>NativeWorker: IcebergConnectorFactory
    NativeWorker->>IcebergFactory: create(connector_id, config)
    IcebergFactory-->>NativeWorker: IcebergConnector
    NativeWorker->>IcebergConnector: plan and execute reads/writes
    IcebergConnector-->>NativeWorker: data pages
    NativeWorker-->>Coordinator: query results
    Coordinator-->>User: final result set
Loading

Class diagram for updated iceberg connector factory registration

classDiagram
    class ConnectorFactory {
        <<interface>>
        +string getName()
        +Connector create(string connectorId, map configuration)
    }

    class HiveConnectorFactory {
        +string name
        +HiveConnectorFactory()
        +string getName()
        +Connector create(string connectorId, map configuration)
    }

    class IcebergConnectorFactory {
        +string name
        +IcebergConnectorFactory()
        +string getName()
        +Connector create(string connectorId, map configuration)
    }

    class IcebergConnector {
        +string connectorId
        +IcebergConnector(string connectorId, map configuration)
        +SplitManager getSplitManager()
        +PageSourceProvider getPageSourceProvider()
    }

    class Registration {
        +void registerConnectorFactories()
    }

    class ConnectorRegistry {
        +void registerFactory(ConnectorFactory factory)
        +ConnectorFactory getFactory(string connectorName)
    }

    ConnectorFactory <|.. HiveConnectorFactory
    ConnectorFactory <|.. IcebergConnectorFactory
    IcebergConnectorFactory --> IcebergConnector
    Registration --> ConnectorRegistry : calls
    Registration --> IcebergConnectorFactory : creates and registers
    Registration --> HiveConnectorFactory : creates and registers
    ConnectorRegistry --> ConnectorFactory : stores
    ConnectorRegistry --> IcebergConnectorFactory : returns for iceberg
    ConnectorRegistry --> HiveConnectorFactory : returns for hive
Loading

File-Level Changes

Change Details Files
Use the Iceberg-specific connector factory instead of the Hive connector factory for the Iceberg connector in native execution.
  • Include the IcebergConnector header from the Velox Hive Iceberg connector library.
  • Replace registration of a HiveConnectorFactory under the Iceberg connector name with registration of IcebergConnectorFactory.
presto-native-execution/presto_cpp/main/connectors/Registration.cpp
Make test catalog configuration choose the connector name dynamically (Iceberg vs Hive) instead of hardcoding Hive.
  • Introduce a connectorName variable that is set to 'iceberg' when the catalog name is 'iceberg', otherwise 'hive'.
  • Use the connectorName in properties files written for the catalog, both with and without caching enabled.
  • Ensure the cached catalog properties also use the dynamic connectorName instead of hardcoded 'hive'.
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/PrestoNativeQueryRunnerUtils.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@PingLiuPing PingLiuPing marked this pull request as ready for review November 28, 2025 06:35
@PingLiuPing PingLiuPing requested review from a team as code owners November 28, 2025 06:35
@prestodb-ci prestodb-ci requested review from a team, BryanCutler and Mariamalmesfer and removed request for a team November 28, 2025 06:35
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In Registration.cpp, the IcebergConnectorFactory is now registered without explicitly using kIcebergConnectorName; consider ensuring that the factory’s internal name matches this constant (or wiring the constant through if supported) so that existing catalog configurations relying on that name remain consistent.
  • In PrestoNativeQueryRunnerUtils, the connector name for Iceberg is hard-coded as the string "iceberg"; consider referencing a shared constant (e.g., the same value as kIcebergConnectorName) to avoid subtle drift between test setup and production connector registration.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `Registration.cpp`, the `IcebergConnectorFactory` is now registered without explicitly using `kIcebergConnectorName`; consider ensuring that the factory’s internal name matches this constant (or wiring the constant through if supported) so that existing catalog configurations relying on that name remain consistent.
- In `PrestoNativeQueryRunnerUtils`, the connector name for Iceberg is hard-coded as the string "iceberg"; consider referencing a shared constant (e.g., the same value as `kIcebergConnectorName`) to avoid subtle drift between test setup and production connector registration.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@PingLiuPing PingLiuPing changed the title feat(native): Register iceberg connector factory misc(native): Replace iceberg connector HiveConnectorFactory with IcebergConnectorFactory Nov 28, 2025
@PingLiuPing PingLiuPing changed the title misc(native): Replace iceberg connector HiveConnectorFactory with IcebergConnectorFactory misc(native): Replace HiveConnectorFactory with IcebergConnectorFactory in iceberg connector Nov 28, 2025
@aditi-pandit
Copy link
Copy Markdown
Contributor

@PingLiuPing : Thanks for this code. The code looks good, but you will need to fix all the errors from Advancing Velox. PTAL.

@PingLiuPing
Copy link
Copy Markdown
Contributor Author

@PingLiuPing : Thanks for this code. The code looks good, but you will need to fix all the errors from Advancing Velox. PTAL.

Thanks, I checked the failure pipeline. We should wait facebookincubator/velox#15658 to get merged first. Then the CI should pass.

Copy link
Copy Markdown
Contributor

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending tests, just one suggestion about the testing worker launcher


Path catalogDirectoryPath = tempDirectoryPath.resolve("catalog");
Files.createDirectory(catalogDirectoryPath);
String connectorName = catalogName.equals("iceberg") ? "iceberg" : "hive";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to add another argument for connectorName when calling getExternalWorkerLauncher

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @BryanCutler , by checking the source further I think we can just use catalogName as the connector name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could see catalogName is the same in this case, I just meant it's not very clear until you go through the source and see what values are used for the test catalogs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler Thanks, I added connectorName. Would you have a look?

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_connector branch 2 times, most recently from e7794d3 to ac7417e Compare December 9, 2025 02:27
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing Thanks for this change. As I see, writing data to an Iceberg table would fail without this change if we update to the latest velox version, is that right? Just a bit curious why the TestUnpartitionedWrite isn't failing in the CI tests in other PRs. Do you have any insight on this?

@PingLiuPing
Copy link
Copy Markdown
Contributor Author

@PingLiuPing Thanks for this change. As I see, writing data to an Iceberg table would fail without this change if we update to the latest velox version, is that right? Just a bit curious why the TestUnpartitionedWrite isn't failing in the CI tests in other PRs. Do you have any insight on this?

Thanks, let me have a look.

@PingLiuPing
Copy link
Copy Markdown
Contributor Author

@hantangwangd I just verified that without this PR, TestUnpartitionedWrite failed with same error with running IcebergExternalQueryRunner.
The reason is in Velox I have moved the creation of IcebergDataSink from HiveConnector to IcebergConnector. So without this PR, IcebergDataSink is not created and that is why it reports path is NULL. See https://github.com/facebookincubator/velox/pull/15581/changes#diff-587fe1837b68791daefbadf6067b2362b2911480ccfe258615a78723dcf8d1c3L77

@hantangwangd
Copy link
Copy Markdown
Member

@PingLiuPing Thanks for the explanation. It's a little curious to me that the CI tests didn't catch a failure in TestUnpartitionedWrite before this change was merged. Given the reason you described, the test should have stared failing. I suspect that the iceberg native e2e tests under presto-native-execution/src/test/....../nativeworker/iceberg/ are not included in CI test prestocpp-linux-build-for-test. See https://github.com/prestodb/presto/blob/master/.github/workflows/prestocpp-linux-build-and-unit-test.yml#L203-L214.

Should we add these test classes to our CI flow? Or do you have any relevant context on this?

@PingLiuPing
Copy link
Copy Markdown
Contributor Author

@hantangwangd Thanks. Should I add those tests to CI in this PR or do you prefer add them to CI in a separate PR?

@hantangwangd
Copy link
Copy Markdown
Member

@PingLiuPing If you don't mind, I think it would be better to add them directly in this PR.

@PingLiuPing
Copy link
Copy Markdown
Contributor Author

@PingLiuPing If you don't mind, I think it would be better to add them directly in this PR.

@hantangwangd I have added those tests to CI.

@PingLiuPing
Copy link
Copy Markdown
Contributor Author

The error relates to disk space. Seems #26876 can resolve the issue.

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing thanks for the fix, lgtm!

@PingLiuPing PingLiuPing merged commit 0155669 into prestodb:master Dec 31, 2025
86 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants