feat(iceberg): Add $snapshot_id as hidden column in iceberg table by agrawalreetika · Pull Request #26189 · prestodb/presto

agrawalreetika · 2025-09-30T05:59:43Z

Description

Add $snapshot_id as a hidden column in the iceberg table

Motivation and Context

Add $snapshot_id as a hidden column in the iceberg table
Addresses #26164

Impact

Test Plan

Integration test added

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add $snapshot_id as a hidden column in the iceberg table

sourcery-ai · 2025-09-30T05:59:50Z

Reviewer's Guide

This PR introduces $snapshot_id as a hidden metadata column by extending split representations and split sources to carry snapshot IDs, integrating the new column into column handles and table metadata, exposing it in page sources, and validating the feature via an integration test.

ER diagram for new $snapshot_id metadata column in table schema

erDiagram
ICEBERG_TABLE {
  VARCHAR $path
  BIGINT $data_sequence_number
  BOOLEAN $deleted
  VARCHAR $delete_file_path
  BIGINT $snapshot_id
}
ICEBERG_TABLE ||--o{ ICEBERG_SPLIT : contains
ICEBERG_SPLIT {
  BIGINT snapshotId
}

Class diagram for updated IcebergSplit and related classes

classDiagram
class IcebergSplit {
  - long dataSequenceNumber
  - long affinitySchedulingFileSectionSize
  - long affinitySchedulingFileSectionIndex
  + long snapshotId
  + getSnapshotId(): long
}
class ChangelogSplitSource {
  - long snapshotId
  + ChangelogSplitSource(..., long snapshotId)
}
class EqualityDeletesSplitSource {
  - long snapshotId
  + EqualityDeletesSplitSource(..., long snapshotId)
}
class IcebergSplitSource {
  - long snapshotId
  + IcebergSplitSource(...)
}
IcebergSplitSource --> IcebergSplit
ChangelogSplitSource --> IcebergSplit
EqualityDeletesSplitSource --> IcebergSplit

Class diagram for IcebergColumnHandle and IcebergMetadataColumn changes

classDiagram
class IcebergColumnHandle {
  + static SNAPSHOT_ID_COLUMN_HANDLE: IcebergColumnHandle
  + static SNAPSHOT_ID_COLUMN_METADATA: ColumnMetadata
  + isSnapshotId(): boolean
}
class IcebergMetadataColumn {
  + SNAPSHOT_ID
}
IcebergColumnHandle --> IcebergMetadataColumn

File-Level Changes

Change	Details	Files
Propagate snapshotId through split models and sources	Added snapshotId field, constructor parameter, and getter in IcebergSplit Introduced snapshotId field and constructor parameter in ChangelogSplitSource and passed it to split creation Introduced snapshotId field and constructor parameter in EqualityDeletesSplitSource and passed it to split creation Added snapshotId field to IcebergSplitSource and initialized it from tableScan Updated IcebergSplitManager to pass snapshotId when creating split sources	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplit.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/changelog/ChangelogSplitSource.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitSource.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitManager.java`
Define and register snapshot_id as a metadata column	Added SNAPSHOT_ID enum entry in IcebergMetadataColumn Created SNAPSHOT_ID_COLUMN_HANDLE and SNAPSHOT_ID_COLUMN_METADATA and isSnapshotId() in IcebergColumnHandle Included SNAPSHOT_ID_COLUMN_METADATA in metadata columns and mapping in IcebergAbstractMetadata	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMetadataColumn.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergColumnHandle.java` `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java`
Expose snapshotId in page sources	Added handling for snapshot_id in metadataValues in IcebergPageSourceProvider	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPageSourceProvider.java`
Add integration test for snapshot_id hidden column	Introduced testSnapshotIdHiddenColumnSimple to verify distinct $snapshot_id count and current snapshot ID	`presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java`

Possibly linked issues

Add $snapshot_id hidden column to Iceberg connector #26164: The PR adds the $snapshot_id hidden column to the Iceberg connector, implementing the feature requested in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider refactoring snapshotId propagation into a common base or builder to avoid repeating it across all split source constructors and reduce boilerplate.
Include snapshotId in the split info returned by IcebergSplit#getInfo() so that split logs or traces will clearly show which snapshot each split belongs to for easier debugging.
Add an integration test for point-in-time scans (using fromSnapshot/toSnapshot) to verify that $snapshot_id in query results matches the intended historical snapshot, not just the current one.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider refactoring snapshotId propagation into a common base or builder to avoid repeating it across all split source constructors and reduce boilerplate.
- Include snapshotId in the split info returned by IcebergSplit#getInfo() so that split logs or traces will clearly show which snapshot each split belongs to for easier debugging.
- Add an integration test for point-in-time scans (using fromSnapshot/toSnapshot) to verify that $snapshot_id in query results matches the intended historical snapshot, not just the current one.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

tdcmeehan · 2025-10-01T20:24:26Z

In cases where it's unambiguous to do so, this should also push down into Iceberg via ConnectorMetada#getLayoutForConstraint, and essentially turn the table handle itself into a time travel query.

hantangwangd · 2025-10-02T02:02:57Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitSource.java

-                affinitySchedulingFileSectionSize);
+                affinitySchedulingFileSectionSize,
+                snapshotId);


I'm a bit concerned about the snapshotId selection here. It seems like we are using the table-level snapshotId taken when the entire table was scanned, but my understanding is that it should be the snapshotId calculated based on the corresponding data file and delete files, right?

@hantangwangd this is an important observation. Selecting this column would be more useful if it returned the snapshot ID of the data file, i.e. which snapshot ID created the file. However, this column is primarily intended for filtering, as a way of altering the table handle to force a time travel on the table without introducing a new SPI or connector optimizer. Given this column will be hidden and not intended for direct use, I am comfortable with this being the snapshot ID of the scan, as that fulfills the intended purpose.

@tdcmeehan thanks for the detailed explanation. Based on my understanding of PR #26164 and the comments here, the primary purpose of this $snapshot_id column is to enable predicate pushdown for the filter WHERE $snapshot_id > xxx which is used to query incremental data since a specified snapshot. Therefore column $snapshot_id should be disallowed to be specified directly in a query, and shouldn't exist in any filter node which couldn't be completely pushdown to Iceberg connector, is this correct?

tdcmeehan · 2025-10-02T15:28:27Z

One additional thing we should probably do is fail in case any predicate is provided which compares the snapshot ID to any non-constant value, or any less than predicate is supplied, as they're just too dangerous. Only greater than should be supported, since this will use the latest schema.

agrawalreetika · 2025-10-07T08:01:59Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

+                        // Only support >= X
+                        Optional<Long> lower = Optional.of(((Number) range.getLowBoundedValue()).longValue());
+                        handle = handle.withUpdatedIcebergTableName(
+                                new IcebergTableName(name.getTableName(), name.getTableType(), lower, name.getChangelogEndSnapshot()));


@tdcmeehan I wanted to confirm a point here -
Currently, I am updating IcebergTableHandle with the lower bound (X) here (for >= X), but shouldn't we use the latest available snapshot whose ID >= X when the query predicate is $snapshot_id >= X instead? So it ensures we always read using the most recent snapshot schema, which avoids issues that can occur if older snapshots have outdated or incompatible schemas.

PingLiuPing · 2025-10-21T20:55:57Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitManager.java

                    session.getRuntimeStats());

-            return new EqualityDeletesSplitSource(session, icebergTable, deleteFiles);
+            return new EqualityDeletesSplitSource(session, icebergTable, deleteFiles, table.getIcebergTableName().getSnapshotId().get());


getSnapshotId() returns Optional.
Need to check isPresent first before calling get.

PingLiuPing · 2025-10-21T20:57:57Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

+                                new IcebergTableName(name.getTableName(), name.getTableType(), lower, name.getChangelogEndSnapshot()));
+                    }
+                    else {
+                        throw new PrestoException(NOT_SUPPORTED, "Unsupported predicate for $snapshot_id; only >= constant is allowed");


Should we change the message to >= and = since both of them are supported.

PingLiuPing · 2025-10-21T21:01:55Z

presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java

        });
    }

+    @Test


Could you also add a case where there are multiple snapshots?

PingLiuPing · 2025-10-21T21:29:13Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

+
+                if (domain.isSingleValue()) {
+                    Optional<Long> snapshotId = Optional.of(((Number) domain.getSingleValue()).longValue());
+                    handle = handle.withUpdatedIcebergTableName(


What if we are querying time travel tables? Here we will always overwrite the snapshotId.
Probably we should add some check for the snapshot in predicate and the snapshot specified in time travel.

steveburnett · 2025-10-22T14:38:47Z

Should $snapshot_id be added to the Iceberg doc about hidden columns?

agrawalreetika · 2025-10-22T21:48:24Z

@PingLiuPing Thanks for the review, but having snapshot_id with Filter will have issue. As $snapshot_id is not incremental so while calculating delta between 2 snapshots (With query like WHERE $snapshot_id BETWEEN snap1 AND snap2) would return wrong results since comparison is not valid.

Could you please review #26408 which is based on $snapshot_sequence_number?

agrawalreetika requested a review from tdcmeehan September 30, 2025 05:59

agrawalreetika self-assigned this Sep 30, 2025

agrawalreetika requested review from a team, ZacBlanco and hantangwangd as code owners September 30, 2025 05:59

prestodb-ci added the from:IBM PR from IBM label Sep 30, 2025

prestodb-ci requested review from a team, pratyakshsharma and sh-shamsan and removed request for a team September 30, 2025 05:59

sourcery-ai bot reviewed Sep 30, 2025

View reviewed changes

agrawalreetika force-pushed the iceberg-snapshotId branch from 350af58 to 66a2beb Compare September 30, 2025 12:16

agrawalreetika requested a review from a team as a code owner September 30, 2025 12:16

agrawalreetika force-pushed the iceberg-snapshotId branch 3 times, most recently from 6d42a74 to 17c11a0 Compare October 1, 2025 03:39

hantangwangd reviewed Oct 2, 2025

View reviewed changes

Add $snapshot_id as hidden column in iceberg table

29cfe1a

agrawalreetika force-pushed the iceberg-snapshotId branch from 17c11a0 to 29cfe1a Compare October 6, 2025 20:18

agrawalreetika changed the title ~~Add $snapshot_id as hidden column in iceberg table~~ feat(iceberg): Add $snapshot_id as hidden column in iceberg table Oct 6, 2025

agrawalreetika commented Oct 7, 2025

View reviewed changes

PingLiuPing requested changes Oct 21, 2025

View reviewed changes

PingLiuPing mentioned this pull request Oct 23, 2025

feat(plugin-iceberg): Add $snapshot_sequence_number as hidden column in iceberg table #26408

Open

7 tasks

agrawalreetika marked this pull request as draft October 23, 2025 11:17

                       });
                   }
+                  @Test

Conversation

agrawalreetika commented Sep 30, 2025

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

sourcery-ai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

ER diagram for new $snapshot_id metadata column in table schema

Class diagram for updated IcebergSplit and related classes

Class diagram for IcebergColumnHandle and IcebergMetadataColumn changes

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tdcmeehan commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdcmeehan commented Oct 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveburnett commented Oct 22, 2025

Uh oh!

agrawalreetika commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sourcery-ai bot commented Sep 30, 2025 •

edited

Loading