feat(native): Support sorted by during write to Iceberg tables by PingLiuPing · Pull Request #26182 · prestodb/presto

PingLiuPing · 2025-09-29T15:55:37Z

Description

Support sorted by when writing to iceberg table.

Reviewers, please review the last commit only for this PR.

Motivation and Context

Impact

Test Plan

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for sorted by when writing to Iceberg tables.

sourcery-ai · 2025-09-29T15:55:46Z

Reviewer's Guide

This PR introduces support for sorted-by ordering when writing to Iceberg tables by enabling a native insertion path behind a compile-time flag, parsing and propagating sort fields through the Velox connector, and updating the Java metadata and sink implementations to be session-aware and robust for partition transforms and JSON decimal handling.

ER diagram for PartitionTransformType enum changes

erDiagram
    PartitionTransformType {
        IDENTITY
        HOUR
        DAY
        MONTH
        YEAR
        BUCKET
        TRUNCATE
    }

Class diagram for updated IcebergPrestoToVeloxConnector and related types

classDiagram
    class IcebergPrestoToVeloxConnector {
        +toVeloxInsertTableHandle(createHandle, typeParser, pool)
        +toVeloxInsertTableHandle(insertHandle, typeParser, pool)
        +toIcebergColumns(inputColumns, typeParser, hasPartitionColumn)
        +toIcebergSortingColumns(sortFields, schema)
        +toVeloxIcebergPartitionField(field, typeParser, schema)
        +toVeloxIcebergPartitionSpec(spec, typeParser)
    }
    class HivePrestoToVeloxConnector {
        +toVeloxInsertTableHandle(createHandle, typeParser, pool)
        +toVeloxInsertTableHandle(insertHandle, typeParser, pool)
    }
    class PrestoToVeloxConnector {
        +toVeloxInsertTableHandle(createHandle, typeParser, pool)
        +toVeloxInsertTableHandle(insertHandle, typeParser, pool)
    }
    IcebergPrestoToVeloxConnector --|> PrestoToVeloxConnector
    HivePrestoToVeloxConnector --|> PrestoToVeloxConnector
    IcebergPrestoToVeloxConnector o-- "IcebergColumnHandle"
    IcebergPrestoToVeloxConnector o-- "IcebergSortingColumn"
    IcebergPrestoToVeloxConnector o-- "IcebergPartitionSpec"
    IcebergPrestoToVeloxConnector o-- "TypeParser"
    IcebergPrestoToVeloxConnector o-- "MemoryPool"
    HivePrestoToVeloxConnector o-- "TypeParser"
    HivePrestoToVeloxConnector o-- "MemoryPool"
    PrestoToVeloxConnector o-- "TypeParser"
    PrestoToVeloxConnector o-- "MemoryPool"

Class diagram for updated IcebergColumnHandle Java class

classDiagram
    class IcebergColumnHandle {
        +create(column, typeManager, columnType)
        +create(partitionFieldId, column, typeManager, columnType)
        +getPushedDownSubfield(column)
        -columnIdentity
        -type
        -doc
        -columnType
    }

File-Level Changes

Change	Details	Files
Enable Iceberg native insertion via PRESTO_ENABLE_ICEBERG_NATIVE_INSERTION	Define the compile flag in CMakeLists.txt and Makefile Guard connector methods and protocol types behind the flag Add overloads of toVeloxInsertTableHandle with MemoryPool parameter Implement toIcebergColumns, toIcebergSortingColumns, toVeloxIcebergPartitionSpec helpers	`presto-native-execution/CMakeLists.txt` `presto-native-execution/Makefile` `presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp` `presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.h` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.h` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.yml`
Propagate sorted-by order through the Velox connector	Parse SortField.sortOrder into velox::core::SortOrder Include sortedBy vector in IcebergInsertTableHandle construction Update query plan converter to pass MemoryPool to connector Expose sorting parameters in IcebergPageSink	`presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp` `presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp` `presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.h` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPageSink.java`
Make Java Iceberg metadata and utilities session-aware	Add ConnectorSession parameter to getColumns and requiredColumnsForDeletes Switch partition transform type to ALL for INSERT queries Update callers in IcebergUtil, PageSourceProvider, metadata classes, ChangelogSplitSource Adjust POM dependency scopes to support session changes	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPageSourceProvider.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergNativeMetadata.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/changelog/ChangelogSplitSource.java` `presto-iceberg/pom.xml`
Refine partition JSON decimal and transform type handling	Enhance PartitionData.getValue to support long, int, BigInteger for DECIMAL Reorder PartitionTransformType enum entries and mapping tables	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionData.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionTransformType.java` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.h` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

aditi-pandit · 2025-10-01T21:57:07Z

presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp

    const protocol::CreateHandle* createHandle,
-    const TypeParser& typeParser) const {
+    const TypeParser& typeParser,
+    memory::MemoryPool* pool) const {


Why is memory pool variable added ? Doesn't seem like its used. Memory pool should be used for allocating from operators not during this conversion.

Thanks for the comment.
MemoryPool is needed by Iceberg connector. When doing partition transform, we need to create new vectors to hold the transformed value.
And it is not needed here in Hive connector, but since both of Hive and Iceberg inherit from base class and overwrite this method toVeloxInsertTableHandle, I add this parameter in base class.

aditi-pandit · 2025-10-01T21:58:34Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionData.java

                if (partitionValue.isLong()) {
                    return BigDecimal.valueOf(partitionValue.asLong(), ((DecimalType) type).scale());
                }
+                else if (partitionValue.isInt()) {


Can you add some test for this logic ?

Sure, I think I have test case cover this, let me find it.

Yes, I have testcase cover this,

And it can be easily tested in presto-cli. The logic here is, in Velox decimal is represented as integer (int64/int128). But when we passing the partition value back to Presto, it is encoded in json format, and here when decode the value, if the value is a small integer it will be treated as int32.

presto:iceberg> insert into identity_t2 values(1, -123, cast('89.124' AS decimal(5,3)), 'ABCD QWERT', x'455843454C4C454E54'); INSERT: 1 row

aditi-pandit · 2025-10-01T21:59:22Z

...to-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.yml

Did you regenerate presto_protocol ? https://github.com/prestodb/presto/tree/master/presto-native-execution/presto_cpp/presto_protocol#presto-native-worker-protocol-code-generation

Thanks for the comment.
This is a mistake, and the change is duplicate. It must be caused by multiple rebase and merging when I maintain these branches locally. I've reverted all the changes in this file.

PingLiuPing · 2025-10-03T07:53:50Z

For release-notes pipeline error, opened an issue #26222

yingsu00

Approve contingent to the mvn failure resolved.

unidevel · 2025-10-03T10:25:43Z

You may rebase fix the CI now.

aditi-pandit · 2025-10-03T22:12:57Z

presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp

+  std::vector<connector::hive::iceberg::IcebergSortingColumn> sortedBy;
+  sortedBy.reserve(sortFields.size());
+  for (const auto& sortField : sortFields) {
+    velox::core::SortOrder veloxSortOrder(


Please can you leave a comment stating this matches asc/desc, nulls first/nulls last.

Thanks for the comment. Let me add a comment.

aditi-pandit · 2025-10-03T22:14:23Z

presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.h


 private:
-  std::vector<std::shared_ptr<const velox::connector::hive::HiveColumnHandle>>
+  std::vector<std::shared_ptr<const velox::connector::hive::iceberg::IcebergColumnHandle>>


Is this a bug ? Did you intentionally leave HiveColumnHandle in previous commit ?

Thanks for the comment.
Yes, the commit here matches the code in velox. And in velox IcebergColumnHandle is added later. So I change this to match the code in velox.
When I test the code I follow:
Presto PR1 -> Velox PR 1
Presto PR2 -> Velox PR 2

PingLiuPing requested review from imjalpreet, tdcmeehan and yingsu00 September 29, 2025 15:55

PingLiuPing self-assigned this Sep 29, 2025

PingLiuPing requested review from a team, ZacBlanco and hantangwangd as code owners September 29, 2025 15:55

prestodb-ci added the from:IBM PR from IBM label Sep 29, 2025

prestodb-ci requested review from a team and infvg and removed request for a team September 29, 2025 15:55

sourcery-ai bot reviewed Sep 29, 2025

View reviewed changes

aditi-pandit reviewed Oct 1, 2025

View reviewed changes

PingLiuPing force-pushed the lp_iceberg_insertion_sort_order_no_test branch 2 times, most recently from 47e676e to 7bea642 Compare October 2, 2025 11:08

PingLiuPing changed the title ~~[native] Support sorted by during write to Iceberg tables~~ feat(native): Support sorted by during write to Iceberg tables Oct 3, 2025

yingsu00 previously approved these changes Oct 3, 2025

View reviewed changes

PingLiuPing dismissed yingsu00’s stale review via 71ca098 October 3, 2025 08:55

PingLiuPing requested review from czentgr and unidevel as code owners October 3, 2025 08:55

PingLiuPing force-pushed the lp_iceberg_insertion_sort_order_no_test branch from 71ca098 to 7bea642 Compare October 3, 2025 09:37

PingLiuPing force-pushed the lp_iceberg_insertion_sort_order_no_test branch from 7bea642 to f8dfb27 Compare October 3, 2025 10:53

aditi-pandit reviewed Oct 3, 2025

View reviewed changes

PingLiuPing added 2 commits October 10, 2025 09:38

Support insert data into iceberg table.

de49fc4

Auto detect velox iceberg insertion feature.

3eb89af

PingLiuPing added 3 commits October 10, 2025 09:41

Support iceberg insert partition transforms.

701d563

Support collect iceberg data file stats during write

59e7066

Support sort order during write to iceberg table.

95de072

PingLiuPing force-pushed the lp_iceberg_insertion_sort_order_no_test branch from f8dfb27 to ba1006b Compare October 10, 2025 09:38

Separate iceberg to standalone file

2c494e9

PingLiuPing force-pushed the lp_iceberg_insertion_sort_order_no_test branch from ba1006b to 2c494e9 Compare October 10, 2025 10:26

PingLiuPing mentioned this pull request Oct 10, 2025

fix: Support more type conversion for decimal partition value #26240

Merged

6 tasks

Conversation

PingLiuPing commented Sep 29, 2025

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

sourcery-ai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

ER diagram for PartitionTransformType enum changes

Class diagram for updated IcebergPrestoToVeloxConnector and related types

Class diagram for updated IcebergColumnHandle Java class

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PingLiuPing Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PingLiuPing commented Oct 3, 2025

Uh oh!

yingsu00 left a comment

Choose a reason for hiding this comment

Uh oh!

unidevel commented Oct 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sourcery-ai bot commented Sep 29, 2025 •

edited

Loading

PingLiuPing Oct 2, 2025 •

edited

Loading