Support sorted writes in the Iceberg connector by alexjo2144 · Pull Request #14891 · trinodb/trino

alexjo2144 · 2022-11-03T20:47:17Z

Description

Support sorting files during inserts to the Iceberg connector. This reuses the SortingFileWriter from the Hive connector.

Non-technical explanation

Sorting enables better performance during selective read queries, where a small range of values is needed from a high carnality column.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Support for Iceberg table sort orders. Tables can be created or altered to add a list of `sorted_by` columns which will be used to order files written to the table.

alexjo2144 · 2022-11-03T20:48:01Z

Note to self: rework the "Allow updating the sorted_by Iceberg table property" commit to give @osscm author credit.

findinpath · 2022-11-07T12:30:28Z

Add compatibility tests with spark.

findinpath · 2022-11-07T12:41:42Z

Sorting enables better performance during selective read queries, where a small range of values is needed from a high cardinality column.

I'd appreciate having a demo test that emphasises on the fact that the number of files being read is smaller when working with sorted_by. Source of inspiration: io.trino.plugin.iceberg.TestIcebergMetadataFileOperations

I don't see sortOrder in the IcebergTableHandle. Will eventually the sorted reads come as a follow-up PR?

findinpath · 2022-11-07T11:53:28Z

The assertions related to the columns were already made in io.trino.plugin.iceberg.catalog.BaseTrinoCatalogTest#testCreateTable . Are they relevant here as well?

findinpath · 2022-11-07T12:08:24Z

Pre-existing: It would be beneficial to have the purpose of this property documented (in the code).

findinpath · 2022-11-07T12:20:40Z

What do we showcase in this test?
I think the test would pass also without sorted_by property specified on the table.

gentle reminder

new reminder :)

alexjo2144 · 2022-11-07T22:21:06Z

Still working on test cases and Marius' comments, but added support for sorting during updates and during optimize.

findinpath · 2022-11-08T06:03:36Z

Naive question: Why not read all the rows from the given parquet file and actually verify that any given row follows the sort order contract in respect to the previous row?

Hmmm, we could it just sounds more complex to set up the test.

alexjo2144 · 2022-11-15T21:24:29Z

@findinpath @findepi @ebyhr I think this is ready for review. Please take a look when you get a chance

findinpath

LGTM % comments

findinpath · 2022-11-17T12:04:06Z

Should we extract fromIdentifierToColumn to a shared utility class?

Probably, but I'd rather leave it until we add transform support and clean it up then. I think there are a few other things that should be central between both partition and sort transform parsing.

#15088

findinpath · 2022-11-17T12:06:49Z

gentle reminder

findinpath · 2022-11-17T12:56:26Z

Is this a (temporary) fallback in case that the sorted writing does not work as expected?

It can also be useful if your writes are very small (streaming ingest, for example) such that sorting them would be a waste of time until they are compacted.

Writes being small or not sounds like a query-dependent, so it warrants session toggle more than a catalog config.

Also, can a writer detect that written data is small and not worth sorting?
OTOH, sorting small amount of data sounds like not a big deal (as long as it happens fully in memory and doesn't add latency), so why would we care?

The config should still remain as a kill switch.

alexjo2144 · 2023-02-27T22:06:13Z

@ebyhr @findepi I think I got everything green with this one. Can you kick off a build with serets?

ebyhr · 2023-02-28T03:02:04Z

/test-with-secrets sha=0f09920b81b690612b026fcfd6e7c4cb252951ee

github-actions · 2023-02-28T07:16:52Z

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4289597275

findepi · 2023-02-28T08:46:01Z

Run with secrets failed: #13199 (reopened)

Co-authored-by: Alex Jo <jo.alex2144@gmail.com>

findepi · 2023-02-28T08:48:09Z

rebased to resolve a conflict

findepi · 2023-02-28T09:23:52Z

thanks!

Cherry-pick of trinodb/trino#14891 Co-authored-by: Alexander Jo <jo.alex2144@gmail.com>

cla-bot Bot added the cla-signed label Nov 3, 2022

alexjo2144 marked this pull request as draft November 3, 2022 20:48

alexjo2144 requested review from ebyhr, findepi and homar November 3, 2022 20:48

github-actions Bot added the tests:hive label Nov 3, 2022

alexjo2144 self-assigned this Nov 4, 2022

findinpath reviewed Nov 7, 2022

View reviewed changes

alexjo2144 force-pushed the iceberg/sorted-writes branch from ca6d0fb to 105b42b Compare November 7, 2022 22:15

findinpath reviewed Nov 8, 2022

View reviewed changes

alexjo2144 force-pushed the iceberg/sorted-writes branch 5 times, most recently from 37d76ab to e1191f4 Compare November 14, 2022 21:22

findepi mentioned this pull request Nov 14, 2022

Added support for sorted_by while creating iceberg table #12872

Closed

alexjo2144 force-pushed the iceberg/sorted-writes branch 2 times, most recently from 2362877 to b0d4cf0 Compare November 15, 2022 19:39

alexjo2144 marked this pull request as ready for review November 15, 2022 21:23

alexjo2144 requested review from findinpath and osscm November 15, 2022 21:23

findinpath reviewed Nov 17, 2022

View reviewed changes

Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated

alexjo2144 mentioned this pull request Nov 17, 2022

Support Iceberg sort transforms #15088

Open

alexjo2144 force-pushed the iceberg/sorted-writes branch from b0d4cf0 to 4cdd588 Compare November 17, 2022 18:44

alexjo2144 force-pushed the iceberg/sorted-writes branch from c39baaa to c3f4fc1 Compare February 24, 2023 18:17

alexjo2144 added the iceberg Iceberg connector label Feb 24, 2023

findepi mentioned this pull request Feb 27, 2023

Add support for CREATE OR REPLACE TABLE statement #13681

Merged

alexjo2144 force-pushed the iceberg/sorted-writes branch 2 times, most recently from 33590e7 to 0f09920 Compare February 27, 2023 17:19

findepi mentioned this pull request Feb 28, 2023

Table being modified concurrently happens in Glue tests #13199

Closed

Alex Jo and others added 3 commits February 28, 2023 09:46

Migrate Iceberg GCS test to TrinoFileSystem

9dd5385

Extract writer sorting properties from HiveConfig

90b1fc6

Support the Iceberg sorted_by table property

43f58d8

Co-authored-by: Alex Jo <jo.alex2144@gmail.com>

findepi force-pushed the iceberg/sorted-writes branch from 0f09920 to f9d5336 Compare February 28, 2023 08:48

alexjo2144 and others added 2 commits February 28, 2023 09:51

Support sorted writes to Iceberg tables

a6cb722

Document Iceberg sorted_by table property

b7adc4c

findepi force-pushed the iceberg/sorted-writes branch from f9d5336 to b7adc4c Compare February 28, 2023 09:23

findepi merged commit da230aa into trinodb:master Feb 28, 2023

github-actions Bot added this to the 409 milestone Feb 28, 2023

alexjo2144 deleted the iceberg/sorted-writes branch February 28, 2023 15:13

colebow mentioned this pull request Mar 1, 2023

Add Trino 409 release notes #16335

Merged

ebyhr mentioned this pull request May 26, 2023

Iceberg's create table to support sort_order property #12447

Closed

alexjo2144 mentioned this pull request Aug 16, 2023

Approximate written bytes in Hive and Iceberg sorted writers #18706

Merged

evanvdia mentioned this pull request Feb 21, 2024

Add Support for Iceberg table sort orders prestodb/presto#21977

Merged

7 tasks

evanvdia added a commit to evanvdia/presto that referenced this pull request Feb 7, 2025

Add Support for Iceberg table sort orders

d0a639b

Cherry-pick of trinodb/trino#14891 Co-authored-by: Alexander Jo <jo.alex2144@gmail.com>

ZacBlanco pushed a commit to prestodb/presto that referenced this pull request Feb 7, 2025

Add Support for Iceberg table sort orders

003d86a

Cherry-pick of trinodb/trino#14891 Co-authored-by: Alexander Jo <jo.alex2144@gmail.com>

jp-sivaprasad pushed a commit to jp-sivaprasad/presto that referenced this pull request Mar 10, 2025

Add Support for Iceberg table sort orders

4dde28c

Cherry-pick of trinodb/trino#14891 Co-authored-by: Alexander Jo <jo.alex2144@gmail.com>

grantatspothero mentioned this pull request Sep 16, 2025

Iceberg table sorted_by not working #26112

Open

Conversation

alexjo2144 commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Non-technical explanation

Release notes

Uh oh!

alexjo2144 commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findinpath commented Nov 7, 2022

Uh oh!

findinpath commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexjo2144 commented Nov 7, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexjo2144 commented Nov 15, 2022

Uh oh!

findinpath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexjo2144 commented Feb 27, 2023

Uh oh!

ebyhr commented Feb 28, 2023

Uh oh!

github-actions Bot commented Feb 28, 2023

Uh oh!

findepi commented Feb 28, 2023

Uh oh!

findepi commented Feb 28, 2023

Uh oh!

findepi commented Feb 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

8 participants

alexjo2144 commented Nov 3, 2022 •

edited

Loading

alexjo2144 commented Nov 3, 2022 •

edited

Loading

findinpath commented Nov 7, 2022 •

edited

Loading