Skip to content

feat(native): Support insert data into iceberg table#25389

Closed
PingLiuPing wants to merge 2 commits intoprestodb:masterfrom
PingLiuPing:lp_iceberg_basic_insertion
Closed

feat(native): Support insert data into iceberg table#25389
PingLiuPing wants to merge 2 commits intoprestodb:masterfrom
PingLiuPing:lp_iceberg_basic_insertion

Conversation

@PingLiuPing
Copy link
Contributor

@PingLiuPing PingLiuPing commented Jun 20, 2025

Description

Support insertion for iceberg tables. This PR only contains the basic insertion and does NOT support partition transforms, sort order, statistics yet. Those feature will be addressed in following PR once this PR is merged.
The insertion logic is now been protected by MACRO PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION. So, it is not depends on velox code change anymore.

The core implementation is inside velox, see PR facebookincubator/velox#10996

To enable this feature, run:

`make EXTRA_CMAKE_FLAGS="-DPRESTO_ENABLE_ICEBERG_NATIVE_INSERTION=ON"

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add native insertion support for iceberg connector.

Hive Connector Changes
* Update Hive connector by adding a new Iceberg connector and implement basic Iceberg insertion.

@PingLiuPing PingLiuPing self-assigned this Jun 20, 2025
@PingLiuPing PingLiuPing requested review from a team, ZacBlanco and hantangwangd as code owners June 20, 2025 04:10
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jun 20, 2025
@prestodb-ci prestodb-ci requested review from a team, namya28 and sh-shamsan and removed request for a team June 20, 2025 04:10
@PingLiuPing PingLiuPing changed the title (feat): Support insert data into iceberg table (Prestissimo) (feat): [native] Support insert data into iceberg table Jun 20, 2025
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch 3 times, most recently from a6aacfe to b810cef Compare June 24, 2025 07:16
@steveburnett
Copy link
Contributor

Do we need any documentation update for this PR? If the doc exists already, consider linking to it in the release note entry - see Formatting in the Release Notes Guidelines for link examples.

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch 2 times, most recently from e74926c to a8312fd Compare June 27, 2025 15:27
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch from a8312fd to f731612 Compare July 7, 2025 21:33
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch 2 times, most recently from 1814cc8 to 23df781 Compare August 4, 2025 21:45
@PingLiuPing PingLiuPing requested a review from yingsu00 August 4, 2025 21:46
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch from 23df781 to 0adb84f Compare August 4, 2025 21:50
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing I don't see why we need two commits here. Could you please re-organize the commits like this:

  1. Format-only changes
  2. Refactor toVeloxInsertTableHandle on Hive and extract helper function
  3. Squashed commit that contains the two current commits

Thanks!

velox::connector::hive::iceberg::IcebergInsertTableHandle>(
inputColumns,
std::make_shared<connector::hive::LocationHandle>(
icebergOutputTableHandle->outputPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please confirm this doesn't need /data suffix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
I think it is reasonable to add /data suffix.
But I'm not sure when this function will be executed.
I tried create table, alter table, drop table and both of them do not execute this function.

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch 2 times, most recently from b732b15 to 7b07b3e Compare August 6, 2025 15:06
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi Tony, have you checked the build failure? It seems related.

@PingLiuPing
Copy link
Contributor Author

hi Tony, have you checked the build failure? It seems related.

Thanks @yingsu00 , let me rebase and run test again.

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch from 7b07b3e to b8290c2 Compare August 15, 2025 13:28
@PingLiuPing
Copy link
Contributor Author

@yingsu00 Thanks for your comments, I have fixed them, can you have another look? Or do you prefer I move this PR it internal repo?

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_basic_insertion branch from 5d3a0d7 to 81f9dce Compare October 1, 2025 09:45
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PingLiuPing for this code. Have few comments.


#ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION

std::unique_ptr<velox::connector::ConnectorInsertTableHandle>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move IcebergPrestoToVeloxConnector to its own file ?
We could do that to the remaining PrestoToVeloxConnectors as well as a follow-up cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Thanks for the comment.
Yes, we have same observations. And I opened a PR #25631 days ago.
I will follow-up this once the iceberg insertion related code merged in velox. And to split the iceberg code with Hive code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing : You can move IcebergPrestoToVeloxConnector in its current state to its own file as a first step already (before adding the write parts). That PR should get approved in OSS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Sure, I can open a separate PR and refactor this first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened PR #26237

std::unique_ptr<protocol::ConnectorProtocol> createConnectorProtocol()
const final;

#ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this class to a separate include file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.
In #25631, I have separated this class with other classes.
Can I do it in this follow up PR? If you insist I can move it out in this PR.
After separate them, this macro is not needed as we can conditionally compile the file in CMakeFile.

IcebergTableLayoutHandle,
IcebergColumnHandle,

#ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think its worth adding these ifdefs in the protocol layer here. Should be fine to just include IcebergInsertTableHandle and IcebergOutputTableHandle in the protocol files nonetheless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.
Let me test it and see if the error message are expected.

Copy link
Contributor Author

@PingLiuPing PingLiuPing Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the velox does not support iceberg insertion, and removed this macro I get following error messages:
using IcebergConnectorProtocol = ConnectorProtocolTemplate<
...
IcebergInsertTableHandle,
IcebergOutputTableHandle,
...

presto:iceberg> insert into year_t1 values (TIMESTAMP '2025-02-28 14:00:02', 11, DATE '2025-02-28', 125);

Query 20251002_094514_00000_7gkvm, FAILED, 2 nodes
Splits: 1 total, 0 done (0.00%)
[Latency: client-side: 441ms, server-side: 427ms] [0 rows, 0B] [0 rows/s, 0B/s]

Query 20251002_094514_00000_7gkvm failed:  Unsupported table writer handle: {"@type":"InsertHandle","handle":{"connectorHandle":{"@type":"hive-iceberg","compressionCodec":"GZIP","fileFormat":"PARQUET","inputColumns":[{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":1,"name":"c_timestamp","typeCategory":"PRIMITIVE"},"columnType":"REGULAR","requiredSubfields":[],"type":"timestamp"},{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":2,"name":"c_int","typeCategory":"PRIMITIVE"},"columnType":"PARTITION_KEY","requiredSubfields":[],"type":"integer"},{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":3,"name":"c_date","typeCategory":"PRIMITIVE"},"columnType":"REGULAR","requiredSubfields":[],"type":"date"},{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":4,"name":"c_bigint","typeCategory":"PRIMITIVE"},"columnType":"REGULAR","requiredSubfields":[],"type":"bigint"}],"outputPath":"file:/Users/pingliu/work/opensource/presto/presto-native-execution/data/iceberg_data/HIVE/iceberg/year_t1","partitionSpec":{"fields":[{"fieldId":1000,"name":"c_int","sourceId":2,"transform":"IDENTITY"}],"schema":{"aliases":{},"columnNameToIdMapping":{"c_bigint":4,"c_date":3,"c_int":2,"c_timestamp":1},"columns":[{"id":1,"name":"c_timestamp","optional":true,"prestoType":"timestamp"},{"id":2,"name":"c_int","optional":true,"prestoType":"integer"},{"id":3,"name":"c_date","optional":true,"prestoType":"date"},{"id":4,"name":"c_bigint","optional":true,"prestoType":"bigint"}],"identifierFieldIds":[],"schemaId":0},"specId":0},"schema":{"aliases":{},"columnNameToIdMapping":{"c_bigint":4,"c_date":3,"c_int":2,"c_timestamp":1},"columns":[{"id":1,"name":"c_timestamp","optional":true,"prestoType":"timestamp"},{"id":2,"name":"c_int","optional":true,"prestoType":"integer"},{"id":3,"name":"c_date","optional":true,"prestoType":"date"},{"id":4,"name":"c_bigint","optional":true,"prestoType":"bigint"}],"identifierFieldIds":[],"schemaId":0},"schemaName":"iceberg","sortOrder":[],"storageProperties":{"commit.retry.num-retries":"4","read.split.target-size":"134217728","write.delete.mode":"merge-on-read","write.format.default":"PARQUET","write.metadata.delete-after-commit.enabled":"false","write.metadata.metrics.max-inferred-column-defaults":"100","write.metadata.previous-versions-max":"100","write.parquet.compression-codec":"GZIP","write.update.mode":"merge-on-read"},"tableName":{"snapshotId":5359319739731907098,"tableName":"year_t1","tableType":"DATA"}},"connectorId":"iceberg","transactionHandle":{"@type":"hive","uuid":"451ae372-1bbe-4410-9f93-40a24b60c27c"}},"schemaTableName":{"schema":"iceberg","table":"year_t1"}}

When using NotImplemented

using IcebergConnectorProtocol = ConnectorProtocolTemplate<
    ...
    NotImplemented,
    NotImplemented,
    ...
>;

I got following error message:

presto:iceberg> insert into year_t1 values (TIMESTAMP '2025-02-28 14:00:02', 11, DATE '2025-02-28', 125);
Query 20251002_095454_00000_mpr4s failed:  Not implemented: N8facebook6presto8protocol26ConnectorInsertTableHandleE

Seems we should use the last one as it is more precise and shorter. What's your opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you are coming from, but I feel that's not how protocol conversion should be used. Protocol conversion from Java -> C++ artifacts should happen just as choosing the language backend. If we have some unsupported logic in the Presto -> Velox layer, then its reasonable to error in the PrestoToVelox code. The error message "Unsupported table writer handle" with the handle value is a reasonable behavior imo.

Just adding "IcebergInsertTableHandle,
IcebergOutputTableHandle" and updating the protocol layer can be submit independently imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @aditi-pandit. We can document this limitation so it's easier for folks to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan @aditi-pandit Thanks for the comments. I will fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this change to #26237

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.
I didn't regenerate the presto_protocol because I didn't change .yml file in this PR.

FYI, I run make presto_protocol based on this PR though, and I do see some file been changed. But after careful check I think they are just format changes. And most likely has nothing to do with code change in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no changes in files in presto_protocol/connector/iceberg folder ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.
When I run make presto_protocol there are files been changed in presto_protocol/connector/iceberg folder. But I deem all of those are just code format change. So, I choose not to commit them.

@tdcmeehan tdcmeehan changed the title (feat): [native] Support insert data into iceberg table feat(native): Support insert data into iceberg table Oct 2, 2025

#ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION

std::unique_ptr<velox::connector::ConnectorInsertTableHandle>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has a using namespace directive for the velox namespace. So "velox" namespace can be removed from variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let me fix this.


#ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION

std::unique_ptr<velox::connector::ConnectorInsertTableHandle>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing : You can move IcebergPrestoToVeloxConnector in its current state to its own file as a first step already (before adding the write parts). That PR should get approved in OSS.

IcebergTableLayoutHandle,
IcebergColumnHandle,

#ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you are coming from, but I feel that's not how protocol conversion should be used. Protocol conversion from Java -> C++ artifacts should happen just as choosing the language backend. If we have some unsupported logic in the Presto -> Velox layer, then its reasonable to error in the PrestoToVelox code. The error message "Unsupported table writer handle" with the handle value is a reasonable behavior imo.

Just adding "IcebergInsertTableHandle,
IcebergOutputTableHandle" and updating the protocol layer can be submit independently imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no changes in files in presto_protocol/connector/iceberg folder ?

}
case DECIMAL:
return partitionValue.decimalValue().setScale(((DecimalType) type).scale());
if (partitionValue.isLong()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change need to be in the PR for Iceberg insertion ? We can make it an independent change I feel. Can you open a new PR for it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is part of the insertion. This piece of code is used to handle the partition value which will be wrote to iceberg manifest file.

@steveburnett
Copy link
Contributor

Should this make EXTRA_CMAKE_FLAGS="-DPRESTO_ENABLE_ICEBERG_NATIVE_INSERTION=ON be documented somewhere?

Perhaps in https://github.com/prestodb/presto/blob/master/presto-native-execution/README.md?

@PingLiuPing
Copy link
Contributor Author

Should this make EXTRA_CMAKE_FLAGS="-DPRESTO_ENABLE_ICEBERG_NATIVE_INSERTION=ON be documented somewhere?

Perhaps in https://github.com/prestodb/presto/blob/master/presto-native-execution/README.md?

Thanks, actually this PR is not expected to be merged to OSS. The underlying code in velox has not been merged. The config option is just a workaround to make the code can compile when the code in velox is not merged.

@steveburnett
Copy link
Contributor

Should this make EXTRA_CMAKE_FLAGS="-DPRESTO_ENABLE_ICEBERG_NATIVE_INSERTION=ON be documented somewhere?
Perhaps in https://github.com/prestodb/presto/blob/master/presto-native-execution/README.md?

Thanks, actually this PR is not expected to be merged to OSS. The underlying code in velox has not been merged. The config option is just a workaround to make the code can compile when the code in velox is not merged.

I appreciate you taking the time to explain! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants