Accelerate Iceberg when reading partition columns only by findepi · Pull Request #19303 · trinodb/trino

findepi · 2023-10-06T21:27:59Z

Description

Avoid data files I/O when

reading only partitioning columns
doing count(*) queries on Iceberg tables (with no group by, or grouping by partitioning columns)

findepi · 2023-10-06T21:29:20Z

findinpath · 2023-10-06T22:25:41Z

...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java

Please add tests when doing count(*) from the whole table.

~~Add a test involving a filter on a non partition column on a partitioned table.~~

~~Also important check the file system accesses for min(C)/max(C) .~~

Also add another test when there are delete files for the table.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

Manifests contain trustworthy information about record count, so it can be used to answer the count(*) queries.

maswin · 2025-01-23T07:27:38Z

We have a query which joins with a table only on partition columns.
Due to this part, we are not able to utilize the combined table scan.

if (wholeFileTask.deletes().isEmpty() && noDataColumnsProjected(wholeFileTask)) {
    fileTasksIterator = List.of(wholeFileTask).iterator();
}
else {
    fileTasksIterator = wholeFileTask.split(targetSplitSize).iterator();
}

The table has 1 million small files, but it is creating more than 1 million splits since it is not converting this scan into SplittableScanTask
Any reason why we are skipping combining splits when no data columns are projected?

findepi requested review from alexjo2144 and ebyhr October 6, 2023 21:27

cla-bot bot added the cla-signed label Oct 6, 2023

findepi force-pushed the findepi/iceberg-count-only branch from 7f1a960 to 7a10c4b Compare October 6, 2023 21:28

findepi changed the title ~~Findepi/iceberg count only~~ Process count(*) on Iceberg without opening data files Oct 6, 2023

findepi mentioned this pull request Oct 6, 2023

Add aggregation pushdown support for count using Iceberg Metrics #15832

Open

github-actions bot added the iceberg Iceberg connector label Oct 6, 2023

findinpath reviewed Oct 6, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated Show resolved Hide resolved

findepi changed the title ~~Process count(*) on Iceberg without opening data files~~ Accelerate Iceberg when reading partition columns only Oct 7, 2023

findepi force-pushed the findepi/iceberg-count-only branch from 7a10c4b to b587f51 Compare October 9, 2023 11:48

findepi mentioned this pull request Oct 13, 2023

Improve the performance of Min/Max/Count utilizing Iceberg table metrics #10974

Open

Accelerate Iceberg when reading partition columns only

b756564

Manifests contain trustworthy information about record count, so it can be used to answer the count(*) queries.

findepi force-pushed the findepi/iceberg-count-only branch from b587f51 to b756564 Compare October 16, 2023 09:48

losipiuk approved these changes Oct 16, 2023

View reviewed changes

findepi merged commit c2a30f8 into trinodb:master Oct 16, 2023

findepi deleted the findepi/iceberg-count-only branch October 16, 2023 12:07

github-actions bot added this to the 430 milestone Oct 16, 2023

findepi mentioned this pull request Oct 16, 2023

Improve the performance of Count(*) utilizing Iceberg table metrics #15745

Closed

colebow mentioned this pull request Oct 17, 2023

Add Trino 430 release notes #19428

Merged

raunaqmorarka mentioned this pull request Nov 27, 2023

Trino+iceberg, select count(*) is too slow #19909

Closed

kevinjqliu mentioned this pull request Oct 8, 2024

Count rows as a metadata-only operation apache/iceberg-python#1223

Closed

dirtysalt mentioned this pull request Jun 11, 2025

Optimize Iceberg table count StarRocks/starrocks#46525

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate Iceberg when reading partition columns only#19303

Accelerate Iceberg when reading partition columns only#19303
findepi merged 1 commit intotrinodb:masterfrom
findepi:findepi/iceberg-count-only

findepi commented Oct 6, 2023 •

edited

Loading

Uh oh!

findepi commented Oct 6, 2023

Uh oh!

findinpath Oct 6, 2023 •

edited

Loading

Uh oh!

Uh oh!

maswin commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

findepi commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

findepi commented Oct 6, 2023

Uh oh!

findinpath Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maswin commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

findepi commented Oct 6, 2023 •

edited

Loading

findinpath Oct 6, 2023 •

edited

Loading