Fix `table_changes` incorrect results when querying cow tables by chenjian2664 · Pull Request #27827 · trinodb/trino

chenjian2664 · 2026-01-02T13:51:21Z

Description

Iceberg table_changes may return duplicate(incorrect) rows when querying tables written using the copy-on-write (CoW) update model.

In a CoW write path, an engine may update only a subset of rows within a data file. During this process, the unchanged rows from the original file are rewritten into a new data file together with the updated rows, while the original file is removed. Iceberg represents the removed file using DeletedDataFileScanTask .

However, DeletedDataFileScanTask does not differentiate between rows that are actually deleted or updated and rows that are merely rewritten due to the copy-on-write process. As a result, table_changes can incorrectly interpret unchanged rows as deleted, leading to incorrect change semantics and, in some cases, duplicate results.

Example of failures

spark> create table t (x int, y varchar);
// snapshot 7527505804807355682
spark> insert into t values (5, 'a'), (4, 'b')

spark> insert into t values (5, 'a'), (4, 'b')

spark> update t set y = 'updated' where x = 5;
// snapshot 1538643479657750339

trino> select * from TABLE(system.table_changes(schema_name => 'default', table_name => 't2', start_snapshot_id => 7527505804807355682, end_snapshot_id => 1538643479657750339)) order by _change_ordinal;
returns 
 x |    y    | _change_type | _change_version_id  |      _change_timestamp      | _change_ordinal
---+---------+--------------+---------------------+-----------------------------+-----------------
 5 | a       | insert       | 1710304824426268447 | 2025-12-22 08:56:03.496 UTC |               0
 4 | b       | insert       | 1710304824426268447 | 2025-12-22 08:56:03.496 UTC |               0
 5 | a       | insert       | 1219451281551863158 | 2025-12-22 08:56:14.544 UTC |               1
 4 | b       | insert       | 1219451281551863158 | 2025-12-22 08:56:14.544 UTC |               1
 5 | a       | delete       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 5 | a       | delete       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 4 | b       | delete       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 4 | b       | insert       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 4 | b       | delete       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 5 | updated | insert       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 5 | updated | insert       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2
 4 | b       | insert       | 1538643479657750339 | 2025-12-22 09:00:31.760 UTC |               2

The _change_ordinal equals 2 of the records that x equals 4 are duplicated

Workaround Approach

This PR introduces a minimal, planner-independent workaround to handle duplicates:
1. Splits are grouped by partition and change ordinality to capture all changes from a single operation (update or delete).
2. For CoW tables, the CopyOnWriteTableChangesFunctionProcessor counts row-level changes:
* Insert → +1
* Delete → -1
The final count determines whether the row has an actual change. Only rows with a net change are returned.

This approach is inspired by Spark's RemoveCarryoverIterator, which deduplicates changes by comparing adjacent rows. Unlike Spark, this implementation avoids post-scan sorting and does not require modifying the Trino planner.

Trade-offs:

Parallelism during scanning may be reduced, especially for unpartitioned tables.
- also increased the memory usage within table function processor
Changes are minimal, backward-compatible, and do not alter the behavior of the table_changes procedure.

This ensures that table_changes correctly reflects logical row-level changes for copy-on-write tables without introducing false deletions or duplicates.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Fix `table_changes` incorrect results when querying cow tables. ({issue}`27827`)

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergExecutorModule.java

findinpath · 2026-01-05T16:31:51Z

...rg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesSplitSource.java

-        this.icebergTable = requireNonNull(icebergTable, "table is null");
-        this.tableScan = requireNonNull(tableScan, "tableScan is null");
-        this.targetSplitSize = tableScan.targetSplitSize();
+        this.delegate = switch (rowLevelOperationMode(icebergTable)) {


i think we need to consider the row-level operation mode per each snapshot and not only for the latest snapshot of the table.

It seems not possible to check files in per snapshot were written by cow or mor mode :(

But I think you are right. We may want to include the write information, specifically the merge_mode, in the snapshot metadata. Perhaps we should propose this to the Iceberg community. What do you think?

It appears we are facing the following situation: we rely on the MERGE_MODE property to infer how files were written. However, some engines may ignore this property, which can lead to incorrect results.

Previously, the behavior was:

If the files were written in copy-on-write mode, the results were incorrect.

If the files were written in merge-on-read mode, the results were correct.

With this PR:

If the files were written in COW mode (even in some snapshots) but the table property MERGE_MODE is set to merge-on-read, incorrect results are produced.

In all other cases, the results are correct.

github-actions · 2026-01-28T17:11:21Z

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

chenjian2664 · 2026-02-13T01:36:24Z

Let me close it now, the behavior for cow tables isn't treated as "bug" though it is strange

cla-bot bot added the cla-signed label Jan 2, 2026

chenjian2664 marked this pull request as draft January 2, 2026 13:51

github-actions bot added the iceberg Iceberg connector label Jan 2, 2026

chenjian2664 force-pushed the jack/table-changes-cow-fix branch 5 times, most recently from 6f4980a to cd85dfa Compare January 5, 2026 09:46

chenjian2664 marked this pull request as ready for review January 5, 2026 09:58

chenjian2664 requested review from Praveen2112, ebyhr, findepi and findinpath January 5, 2026 09:58

findinpath reviewed Jan 5, 2026

View reviewed changes

chenjian2664 force-pushed the jack/table-changes-cow-fix branch 2 times, most recently from 7b8dcb4 to db0d026 Compare January 6, 2026 06:54

Fix table_changes incorrect results when querying cow tables

0ff3495

chenjian2664 force-pushed the jack/table-changes-cow-fix branch from db0d026 to 0ff3495 Compare January 7, 2026 08:21

Add todo

d442318

github-actions bot added the stale label Jan 28, 2026

chenjian2664 closed this Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `table_changes` incorrect results when querying cow tables#27827

Fix `table_changes` incorrect results when querying cow tables#27827
chenjian2664 wants to merge 2 commits intotrinodb:masterfrom
chenjian2664:jack/table-changes-cow-fix

chenjian2664 commented Jan 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

findinpath Jan 5, 2026

Uh oh!

chenjian2664 Jan 6, 2026

Uh oh!

chenjian2664 Jan 6, 2026

Uh oh!

chenjian2664 Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

chenjian2664 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

chenjian2664 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example of failures

Workaround Approach

Trade-offs:

Additional context and related issues

Release notes

Uh oh!

Uh oh!

Uh oh!

findinpath Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

chenjian2664 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

chenjian2664 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

chenjian2664 Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

chenjian2664 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

chenjian2664 commented Jan 2, 2026 •

edited

Loading

chenjian2664 Jan 7, 2026 •

edited

Loading