Skip to content

[Iceberg] Fix querying data_sequence_number with equality deletes#25293

Merged
hantangwangd merged 1 commit intoprestodb:masterfrom
xieandrew:fix-dsn-eq-deletes
Jun 15, 2025
Merged

[Iceberg] Fix querying data_sequence_number with equality deletes#25293
hantangwangd merged 1 commit intoprestodb:masterfrom
xieandrew:fix-dsn-eq-deletes

Conversation

@xieandrew
Copy link
Member

@xieandrew xieandrew commented Jun 11, 2025

Description

Fixes the error "Duplicate key $data_sequence_number" when trying to query "$data_sequence_number" on an Iceberg table that has equality deletes. Because Presto implicitly adds a data_sequence_number column in IcebergEqualityDeleteAsJoin already, when this column is explicitly added to the query the column is duplicated and the error occurs.

Motivation and Context

Closes #24629.

Impact

Allows querying "$data_sequence_number" in tables with equality deletes.

Test Plan

Since Presto does not generate equality deletes (only position deletes), use Apache Flink to create the table following the steps outlined in the original issue comments. Then use call iceberg.system.register_table('schemaName', 'tableName', 'metadataDir') to import the table to Presto and run the query select "$data_sequence_number", * from sample;

This query fails before this change and successfully returns rows after.

A unit test was added IcebergDistributedTestBase.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Fix error querying ``$data_sequence_number`` metadata column for table with equality deletes.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jun 11, 2025
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, the main logic looks reasonable to me. Considering that the variables in table scan node's assignment might also have suffixes (e.g. when a single SQL statement involves $data_sequence_number columns from multiple tables), we need to make some adjustments to support such scenarios.

@steveburnett
Copy link
Contributor

Thanks for the release note! Formatting nit:

== RELEASE NOTES ==

Iceberg Connector Changes
* Fix error querying ``$data_sequence_number`` metadata column for table with equality deletes.

@xieandrew xieandrew force-pushed the fix-dsn-eq-deletes branch from 4a4bb15 to f4b5975 Compare June 12, 2025 17:12
@xieandrew
Copy link
Member Author

Thanks, I've edited the release note with the correct format.

@xieandrew xieandrew marked this pull request as ready for review June 12, 2025 17:15
@xieandrew xieandrew requested review from a team and ZacBlanco as code owners June 12, 2025 17:15
@xieandrew xieandrew requested a review from hantangwangd June 12, 2025 17:15
@prestodb-ci prestodb-ci requested review from a team, ScrapCodes and imjalpreet and removed request for a team June 12, 2025 17:15
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, overall looks good to me. Just one thing for discussing.

@xieandrew xieandrew force-pushed the fix-dsn-eq-deletes branch from f4b5975 to 96bf02e Compare June 13, 2025 18:32
@hantangwangd hantangwangd merged commit 3fd0609 into prestodb:master Jun 15, 2025
167 of 168 checks passed
@prestodb-ci prestodb-ci mentioned this pull request Jul 28, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【Iceberg equality scan】fix duplicate key issue when querying metadata column "$data_sequence_number" from an iceberg table within equality deletes

4 participants