-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(native): Support insert data into iceberg table #25389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1493,6 +1493,114 @@ IcebergPrestoToVeloxConnector::createConnectorProtocol() const { | |
| return std::make_unique<protocol::iceberg::IcebergConnectorProtocol>(); | ||
| } | ||
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to move IcebergPrestoToVeloxConnector to its own file ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aditi-pandit Thanks for the comment.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @PingLiuPing : You can move IcebergPrestoToVeloxConnector in its current state to its own file as a first step already (before adding the write parts). That PR should get approved in OSS.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comment. Sure, I can open a separate PR and refactor this first.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened PR #26237
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file has a using namespace directive for the velox namespace. So "velox" namespace can be removed from variable names.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Let me fix this. |
||
| IcebergPrestoToVeloxConnector::toVeloxInsertTableHandle( | ||
|
PingLiuPing marked this conversation as resolved.
|
||
| const protocol::CreateHandle* createHandle, | ||
| const TypeParser& typeParser) const { | ||
| auto icebergOutputTableHandle = | ||
| std::dynamic_pointer_cast<protocol::iceberg::IcebergOutputTableHandle>( | ||
| createHandle->handle.connectorHandle); | ||
|
|
||
| VELOX_CHECK_NOT_NULL( | ||
| icebergOutputTableHandle, | ||
| "Unexpected output table handle type {}", | ||
| createHandle->handle.connectorHandle->_type); | ||
|
|
||
| bool isPartitioned{false}; | ||
| const auto inputColumns = toHiveColumns( | ||
| icebergOutputTableHandle->inputColumns, typeParser, isPartitioned); | ||
|
|
||
| return std::make_unique< | ||
| velox::connector::hive::iceberg::IcebergInsertTableHandle>( | ||
| inputColumns, | ||
| std::make_shared<connector::hive::LocationHandle>( | ||
|
yingsu00 marked this conversation as resolved.
|
||
| fmt::format("{}/data", icebergOutputTableHandle->outputPath), | ||
| fmt::format("{}/data", icebergOutputTableHandle->outputPath), | ||
| connector::hive::LocationHandle::TableType::kNew), | ||
| toVeloxIcebergPartitionSpec( | ||
| icebergOutputTableHandle->partitionSpec, typeParser), | ||
| toVeloxFileFormat(icebergOutputTableHandle->fileFormat), | ||
| nullptr, | ||
| std::optional( | ||
| toFileCompressionKind(icebergOutputTableHandle->compressionCodec))); | ||
| } | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> | ||
| IcebergPrestoToVeloxConnector::toVeloxInsertTableHandle( | ||
| const protocol::InsertHandle* insertHandle, | ||
| const TypeParser& typeParser) const { | ||
| auto icebergInsertTableHandle = | ||
| std::dynamic_pointer_cast<protocol::iceberg::IcebergInsertTableHandle>( | ||
| insertHandle->handle.connectorHandle); | ||
|
|
||
| VELOX_CHECK_NOT_NULL( | ||
| icebergInsertTableHandle, | ||
| "Unexpected insert table handle type {}", | ||
| insertHandle->handle.connectorHandle->_type); | ||
|
|
||
| bool isPartitioned{false}; | ||
|
aditi-pandit marked this conversation as resolved.
|
||
| const auto inputColumns = toHiveColumns( | ||
| icebergInsertTableHandle->inputColumns, typeParser, isPartitioned); | ||
|
|
||
| return std::make_unique<connector::hive::iceberg::IcebergInsertTableHandle>( | ||
| inputColumns, | ||
| std::make_shared<connector::hive::LocationHandle>( | ||
|
yingsu00 marked this conversation as resolved.
|
||
| fmt::format("{}/data", icebergInsertTableHandle->outputPath), | ||
| fmt::format("{}/data", icebergInsertTableHandle->outputPath), | ||
| connector::hive::LocationHandle::TableType::kExisting), | ||
| toVeloxIcebergPartitionSpec( | ||
| icebergInsertTableHandle->partitionSpec, typeParser), | ||
| toVeloxFileFormat(icebergInsertTableHandle->fileFormat), | ||
| nullptr, | ||
| std::optional( | ||
| toFileCompressionKind(icebergInsertTableHandle->compressionCodec))); | ||
| } | ||
|
|
||
| std::vector<std::shared_ptr<const connector::hive::HiveColumnHandle>> | ||
| IcebergPrestoToVeloxConnector::toHiveColumns( | ||
| const protocol::List<protocol::iceberg::IcebergColumnHandle>& inputColumns, | ||
| const TypeParser& typeParser, | ||
| bool& hasPartitionColumn) const { | ||
|
aditi-pandit marked this conversation as resolved.
|
||
| hasPartitionColumn = false; | ||
| std::vector<std::shared_ptr<const connector::hive::HiveColumnHandle>> | ||
| hiveColumns; | ||
| hiveColumns.reserve(inputColumns.size()); | ||
| for (const auto& columnHandle : inputColumns) { | ||
| hasPartitionColumn |= | ||
| columnHandle.columnType == protocol::hive::ColumnType::PARTITION_KEY; | ||
| hiveColumns.emplace_back( | ||
| std::dynamic_pointer_cast<connector::hive::HiveColumnHandle>( | ||
| std::shared_ptr(toVeloxColumnHandle(&columnHandle, typeParser)))); | ||
| } | ||
| return hiveColumns; | ||
| } | ||
|
|
||
| connector::hive::iceberg::IcebergPartitionSpec::Field | ||
| IcebergPrestoToVeloxConnector::toVeloxIcebergPartitionField( | ||
| const protocol::iceberg::IcebergPartitionField& field) const { | ||
| return connector::hive::iceberg::IcebergPartitionSpec::Field( | ||
| field.name, | ||
| static_cast<connector::hive::iceberg::TransformType>(field.transform), | ||
| field.parameter ? *field.parameter : std::optional<int32_t>()); | ||
| } | ||
|
|
||
| std::unique_ptr<velox::connector::hive::iceberg::IcebergPartitionSpec> | ||
| IcebergPrestoToVeloxConnector::toVeloxIcebergPartitionSpec( | ||
| const protocol::iceberg::PrestoIcebergPartitionSpec& spec, | ||
| const facebook::presto::TypeParser& typeParser) const { | ||
| std::vector<connector::hive::iceberg::IcebergPartitionSpec::Field> fields; | ||
| fields.reserve(spec.fields.size()); | ||
| for (auto field : spec.fields) { | ||
| fields.emplace_back(toVeloxIcebergPartitionField(field)); | ||
| } | ||
| return std::make_unique<connector::hive::iceberg::IcebergPartitionSpec>( | ||
| spec.specId, fields); | ||
| } | ||
|
|
||
| #endif | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorSplit> | ||
| TpchPrestoToVeloxConnector::toVeloxSplit( | ||
| const protocol::ConnectorId& catalogId, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,11 +13,15 @@ | |
| */ | ||
| #pragma once | ||
|
|
||
| #include "presto_cpp/main/types/PrestoToVeloxExpr.h" | ||
| #include "presto_cpp/presto_protocol/connector/hive/presto_protocol_hive.h" | ||
| #include "presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.h" | ||
| #include "presto_cpp/presto_protocol/core/ConnectorProtocol.h" | ||
| #include "presto_cpp/main/types/PrestoToVeloxExpr.h" | ||
| #include "velox/connectors/Connector.h" | ||
| #include "velox/connectors/hive/TableHandle.h" | ||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
| #include "velox/connectors/hive/iceberg/IcebergDataSink.h" | ||
| #endif | ||
| #include "velox/core/PlanNode.h" | ||
| #include "velox/vector/ComplexVector.h" | ||
|
|
||
|
|
@@ -59,8 +63,7 @@ class PrestoToVeloxConnector { | |
| const protocol::TableHandle& tableHandle, | ||
| const VeloxExprConverter& exprConverter, | ||
| const TypeParser& typeParser, | ||
| velox::connector::ColumnHandleMap& assignments) | ||
| const = 0; | ||
| velox::connector::ColumnHandleMap& assignments) const = 0; | ||
|
|
||
| [[nodiscard]] virtual std::unique_ptr< | ||
| velox::connector::ConnectorInsertTableHandle> | ||
|
|
@@ -134,8 +137,7 @@ class HivePrestoToVeloxConnector final : public PrestoToVeloxConnector { | |
| const protocol::TableHandle& tableHandle, | ||
| const VeloxExprConverter& exprConverter, | ||
| const TypeParser& typeParser, | ||
| velox::connector::ColumnHandleMap& assignments) | ||
| const final; | ||
| velox::connector::ColumnHandleMap& assignments) const final; | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> | ||
| toVeloxInsertTableHandle( | ||
|
|
@@ -184,11 +186,41 @@ class IcebergPrestoToVeloxConnector final : public PrestoToVeloxConnector { | |
| const protocol::TableHandle& tableHandle, | ||
| const VeloxExprConverter& exprConverter, | ||
| const TypeParser& typeParser, | ||
| velox::connector::ColumnHandleMap& assignments) | ||
| const final; | ||
| velox::connector::ColumnHandleMap& assignments) const final; | ||
|
|
||
| std::unique_ptr<protocol::ConnectorProtocol> createConnectorProtocol() | ||
| const final; | ||
| std::unique_ptr<protocol::ConnectorProtocol> createConnectorProtocol() | ||
| const final; | ||
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move this class to a separate include file ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comment. |
||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> | ||
| toVeloxInsertTableHandle( | ||
| const protocol::CreateHandle* createHandle, | ||
| const TypeParser& typeParser) const final; | ||
|
|
||
| std::unique_ptr<velox::connector::ConnectorInsertTableHandle> | ||
| toVeloxInsertTableHandle( | ||
| const protocol::InsertHandle* insertHandle, | ||
| const TypeParser& typeParser) const final; | ||
|
|
||
| private: | ||
| std::vector<std::shared_ptr<const velox::connector::hive::HiveColumnHandle>> | ||
| toHiveColumns( | ||
| const protocol::List<protocol::iceberg::IcebergColumnHandle>& | ||
| inputColumns, | ||
| const TypeParser& typeParser, | ||
| bool& hasPartitionColumn) const; | ||
|
|
||
| velox::connector::hive::iceberg::IcebergPartitionSpec::Field | ||
| toVeloxIcebergPartitionField( | ||
| const protocol::iceberg::IcebergPartitionField& filed) const; | ||
|
|
||
| std::unique_ptr<velox::connector::hive::iceberg::IcebergPartitionSpec> | ||
| toVeloxIcebergPartitionSpec( | ||
| const protocol::iceberg::PrestoIcebergPartitionSpec& spec, | ||
| const TypeParser& typeParser) const; | ||
|
|
||
| #endif | ||
| }; | ||
|
|
||
| class TpchPrestoToVeloxConnector final : public PrestoToVeloxConnector { | ||
|
|
@@ -209,8 +241,7 @@ class TpchPrestoToVeloxConnector final : public PrestoToVeloxConnector { | |
| const protocol::TableHandle& tableHandle, | ||
| const VeloxExprConverter& exprConverter, | ||
| const TypeParser& typeParser, | ||
| velox::connector::ColumnHandleMap& assignments) | ||
| const final; | ||
| velox::connector::ColumnHandleMap& assignments) const final; | ||
|
|
||
| std::unique_ptr<protocol::ConnectorProtocol> createConnectorProtocol() | ||
| const final; | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you regenerate the presto_protocol after this change ?https://github.com/prestodb/presto/tree/master/presto-native-execution/presto_cpp/presto_protocol#presto-native-worker-protocol-code-generation
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comment. FYI, I run
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are no changes in files in presto_protocol/connector/iceberg folder ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comment. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,8 +22,15 @@ using IcebergConnectorProtocol = ConnectorProtocolTemplate< | |
| IcebergTableHandle, | ||
| IcebergTableLayoutHandle, | ||
| IcebergColumnHandle, | ||
|
|
||
| #ifdef PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't think its worth adding these ifdefs in the protocol layer here. Should be fine to just include IcebergInsertTableHandle and IcebergOutputTableHandle in the protocol files nonetheless.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comment.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the velox does not support iceberg insertion, and removed this macro I get following error messages: When using NotImplemented I got following error message: Seems we should use the last one as it is more precise and shorter. What's your opinion?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see where you are coming from, but I feel that's not how protocol conversion should be used. Protocol conversion from Java -> C++ artifacts should happen just as choosing the language backend. If we have some unsupported logic in the Presto -> Velox layer, then its reasonable to error in the PrestoToVelox code. The error message "Unsupported table writer handle" with the handle value is a reasonable behavior imo. Just adding "IcebergInsertTableHandle,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed with @aditi-pandit. We can document this limitation so it's easier for folks to understand.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tdcmeehan @aditi-pandit Thanks for the comments. I will fix this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I add this change to #26237 |
||
| IcebergInsertTableHandle, | ||
| IcebergOutputTableHandle, | ||
| #else | ||
| NotImplemented, | ||
| NotImplemented, | ||
| #endif | ||
|
|
||
| IcebergSplit, | ||
| NotImplemented, | ||
| hive::HiveTransactionHandle, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change need to be in the PR for Iceberg insertion ? We can make it an independent change I feel. Can you open a new PR for it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is part of the insertion. This piece of code is used to handle the partition value which will be wrote to iceberg manifest file.