feat(native): Support insert data into iceberg table#25389
feat(native): Support insert data into iceberg table#25389PingLiuPing wants to merge 2 commits intoprestodb:masterfrom
Conversation
a6aacfe to
b810cef
Compare
|
Do we need any documentation update for this PR? If the doc exists already, consider linking to it in the release note entry - see Formatting in the Release Notes Guidelines for link examples. |
e74926c to
a8312fd
Compare
a8312fd to
f731612
Compare
1814cc8 to
23df781
Compare
23df781 to
0adb84f
Compare
yingsu00
left a comment
There was a problem hiding this comment.
@PingLiuPing I don't see why we need two commits here. Could you please re-organize the commits like this:
- Format-only changes
- Refactor toVeloxInsertTableHandle on Hive and extract helper function
- Squashed commit that contains the two current commits
Thanks!
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Outdated
Show resolved
Hide resolved
| velox::connector::hive::iceberg::IcebergInsertTableHandle>( | ||
| inputColumns, | ||
| std::make_shared<connector::hive::LocationHandle>( | ||
| icebergOutputTableHandle->outputPath, |
There was a problem hiding this comment.
Could you please confirm this doesn't need /data suffix?
There was a problem hiding this comment.
Thanks.
I think it is reasonable to add /data suffix.
But I'm not sure when this function will be executed.
I tried create table, alter table, drop table and both of them do not execute this function.
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.h
Outdated
Show resolved
Hide resolved
b732b15 to
7b07b3e
Compare
yingsu00
left a comment
There was a problem hiding this comment.
hi Tony, have you checked the build failure? It seems related.
Thanks @yingsu00 , let me rebase and run test again. |
7b07b3e to
b8290c2
Compare
|
@yingsu00 Thanks for your comments, I have fixed them, can you have another look? Or do you prefer I move this PR it internal repo? |
5d3a0d7 to
81f9dce
Compare
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @PingLiuPing for this code. Have few comments.
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> |
There was a problem hiding this comment.
Is it possible to move IcebergPrestoToVeloxConnector to its own file ?
We could do that to the remaining PrestoToVeloxConnectors as well as a follow-up cleanup.
There was a problem hiding this comment.
@aditi-pandit Thanks for the comment.
Yes, we have same observations. And I opened a PR #25631 days ago.
I will follow-up this once the iceberg insertion related code merged in velox. And to split the iceberg code with Hive code.
There was a problem hiding this comment.
@PingLiuPing : You can move IcebergPrestoToVeloxConnector in its current state to its own file as a first step already (before adding the write parts). That PR should get approved in OSS.
There was a problem hiding this comment.
Thanks for the comment. Sure, I can open a separate PR and refactor this first.
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
Show resolved
Hide resolved
| std::unique_ptr<protocol::ConnectorProtocol> createConnectorProtocol() | ||
| const final; | ||
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION |
There was a problem hiding this comment.
Can we move this class to a separate include file ?
There was a problem hiding this comment.
Thanks for the comment.
In #25631, I have separated this class with other classes.
Can I do it in this follow up PR? If you insist I can move it out in this PR.
After separate them, this macro is not needed as we can conditionally compile the file in CMakeFile.
| IcebergTableLayoutHandle, | ||
| IcebergColumnHandle, | ||
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION |
There was a problem hiding this comment.
Don't think its worth adding these ifdefs in the protocol layer here. Should be fine to just include IcebergInsertTableHandle and IcebergOutputTableHandle in the protocol files nonetheless.
There was a problem hiding this comment.
Thanks for the comment.
Let me test it and see if the error message are expected.
There was a problem hiding this comment.
When the velox does not support iceberg insertion, and removed this macro I get following error messages:
using IcebergConnectorProtocol = ConnectorProtocolTemplate<
...
IcebergInsertTableHandle,
IcebergOutputTableHandle,
...
presto:iceberg> insert into year_t1 values (TIMESTAMP '2025-02-28 14:00:02', 11, DATE '2025-02-28', 125);
Query 20251002_094514_00000_7gkvm, FAILED, 2 nodes
Splits: 1 total, 0 done (0.00%)
[Latency: client-side: 441ms, server-side: 427ms] [0 rows, 0B] [0 rows/s, 0B/s]
Query 20251002_094514_00000_7gkvm failed: Unsupported table writer handle: {"@type":"InsertHandle","handle":{"connectorHandle":{"@type":"hive-iceberg","compressionCodec":"GZIP","fileFormat":"PARQUET","inputColumns":[{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":1,"name":"c_timestamp","typeCategory":"PRIMITIVE"},"columnType":"REGULAR","requiredSubfields":[],"type":"timestamp"},{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":2,"name":"c_int","typeCategory":"PRIMITIVE"},"columnType":"PARTITION_KEY","requiredSubfields":[],"type":"integer"},{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":3,"name":"c_date","typeCategory":"PRIMITIVE"},"columnType":"REGULAR","requiredSubfields":[],"type":"date"},{"@type":"hive-iceberg","columnIdentity":{"children":[],"id":4,"name":"c_bigint","typeCategory":"PRIMITIVE"},"columnType":"REGULAR","requiredSubfields":[],"type":"bigint"}],"outputPath":"file:/Users/pingliu/work/opensource/presto/presto-native-execution/data/iceberg_data/HIVE/iceberg/year_t1","partitionSpec":{"fields":[{"fieldId":1000,"name":"c_int","sourceId":2,"transform":"IDENTITY"}],"schema":{"aliases":{},"columnNameToIdMapping":{"c_bigint":4,"c_date":3,"c_int":2,"c_timestamp":1},"columns":[{"id":1,"name":"c_timestamp","optional":true,"prestoType":"timestamp"},{"id":2,"name":"c_int","optional":true,"prestoType":"integer"},{"id":3,"name":"c_date","optional":true,"prestoType":"date"},{"id":4,"name":"c_bigint","optional":true,"prestoType":"bigint"}],"identifierFieldIds":[],"schemaId":0},"specId":0},"schema":{"aliases":{},"columnNameToIdMapping":{"c_bigint":4,"c_date":3,"c_int":2,"c_timestamp":1},"columns":[{"id":1,"name":"c_timestamp","optional":true,"prestoType":"timestamp"},{"id":2,"name":"c_int","optional":true,"prestoType":"integer"},{"id":3,"name":"c_date","optional":true,"prestoType":"date"},{"id":4,"name":"c_bigint","optional":true,"prestoType":"bigint"}],"identifierFieldIds":[],"schemaId":0},"schemaName":"iceberg","sortOrder":[],"storageProperties":{"commit.retry.num-retries":"4","read.split.target-size":"134217728","write.delete.mode":"merge-on-read","write.format.default":"PARQUET","write.metadata.delete-after-commit.enabled":"false","write.metadata.metrics.max-inferred-column-defaults":"100","write.metadata.previous-versions-max":"100","write.parquet.compression-codec":"GZIP","write.update.mode":"merge-on-read"},"tableName":{"snapshotId":5359319739731907098,"tableName":"year_t1","tableType":"DATA"}},"connectorId":"iceberg","transactionHandle":{"@type":"hive","uuid":"451ae372-1bbe-4410-9f93-40a24b60c27c"}},"schemaTableName":{"schema":"iceberg","table":"year_t1"}}
When using NotImplemented
using IcebergConnectorProtocol = ConnectorProtocolTemplate<
...
NotImplemented,
NotImplemented,
...
>;
I got following error message:
presto:iceberg> insert into year_t1 values (TIMESTAMP '2025-02-28 14:00:02', 11, DATE '2025-02-28', 125);
Query 20251002_095454_00000_mpr4s failed: Not implemented: N8facebook6presto8protocol26ConnectorInsertTableHandleE
Seems we should use the last one as it is more precise and shorter. What's your opinion?
There was a problem hiding this comment.
I see where you are coming from, but I feel that's not how protocol conversion should be used. Protocol conversion from Java -> C++ artifacts should happen just as choosing the language backend. If we have some unsupported logic in the Presto -> Velox layer, then its reasonable to error in the PrestoToVelox code. The error message "Unsupported table writer handle" with the handle value is a reasonable behavior imo.
Just adding "IcebergInsertTableHandle,
IcebergOutputTableHandle" and updating the protocol layer can be submit independently imo.
There was a problem hiding this comment.
Agreed with @aditi-pandit. We can document this limitation so it's easier for folks to understand.
There was a problem hiding this comment.
@tdcmeehan @aditi-pandit Thanks for the comments. I will fix this.
There was a problem hiding this comment.
Did you regenerate the presto_protocol after this change ?https://github.com/prestodb/presto/tree/master/presto-native-execution/presto_cpp/presto_protocol#presto-native-worker-protocol-code-generation
There was a problem hiding this comment.
Thanks for the comment.
I didn't regenerate the presto_protocol because I didn't change .yml file in this PR.
FYI, I run make presto_protocol based on this PR though, and I do see some file been changed. But after careful check I think they are just format changes. And most likely has nothing to do with code change in this PR.
There was a problem hiding this comment.
There are no changes in files in presto_protocol/connector/iceberg folder ?
There was a problem hiding this comment.
Thanks for the comment.
When I run make presto_protocol there are files been changed in presto_protocol/connector/iceberg folder. But I deem all of those are just code format change. So, I choose not to commit them.
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> |
There was a problem hiding this comment.
This file has a using namespace directive for the velox namespace. So "velox" namespace can be removed from variable names.
There was a problem hiding this comment.
Thanks. Let me fix this.
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> |
There was a problem hiding this comment.
@PingLiuPing : You can move IcebergPrestoToVeloxConnector in its current state to its own file as a first step already (before adding the write parts). That PR should get approved in OSS.
| IcebergTableLayoutHandle, | ||
| IcebergColumnHandle, | ||
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION |
There was a problem hiding this comment.
I see where you are coming from, but I feel that's not how protocol conversion should be used. Protocol conversion from Java -> C++ artifacts should happen just as choosing the language backend. If we have some unsupported logic in the Presto -> Velox layer, then its reasonable to error in the PrestoToVelox code. The error message "Unsupported table writer handle" with the handle value is a reasonable behavior imo.
Just adding "IcebergInsertTableHandle,
IcebergOutputTableHandle" and updating the protocol layer can be submit independently imo.
There was a problem hiding this comment.
There are no changes in files in presto_protocol/connector/iceberg folder ?
| } | ||
| case DECIMAL: | ||
| return partitionValue.decimalValue().setScale(((DecimalType) type).scale()); | ||
| if (partitionValue.isLong()) { |
There was a problem hiding this comment.
Does this change need to be in the PR for Iceberg insertion ? We can make it an independent change I feel. Can you open a new PR for it ?
There was a problem hiding this comment.
Yes, this is part of the insertion. This piece of code is used to handle the partition value which will be wrote to iceberg manifest file.
|
Should this Perhaps in https://github.com/prestodb/presto/blob/master/presto-native-execution/README.md? |
Thanks, actually this PR is not expected to be merged to OSS. The underlying code in velox has not been merged. The config option is just a workaround to make the code can compile when the code in velox is not merged. |
I appreciate you taking the time to explain! Thank you. |
Description
Support insertion for iceberg tables. This PR only contains the basic insertion and does NOT support partition transforms, sort order, statistics yet. Those feature will be addressed in following PR once this PR is merged.
The insertion logic is now been protected by MACRO
PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION. So, it is not depends on velox code change anymore.The core implementation is inside velox, see PR facebookincubator/velox#10996
To enable this feature, run:
`make EXTRA_CMAKE_FLAGS="-DPRESTO_ENABLE_ICEBERG_NATIVE_INSERTION=ON"
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.