Skip to content

Add copy-on-write support for Iceberg DELETE, UPDATE, and MERGE operations#27747

Closed
nookcreed wants to merge 1 commit intotrinodb:masterfrom
nookcreed:master
Closed

Add copy-on-write support for Iceberg DELETE, UPDATE, and MERGE operations#27747
nookcreed wants to merge 1 commit intotrinodb:masterfrom
nookcreed:master

Conversation

@nookcreed
Copy link
Copy Markdown

Fixes: #26161
Addresses design considerations from: #17272

Description

This change implements configurable copy-on-write (CoW) mode for row-level operations in Iceberg tables, giving users control over the read vs write performance trade-off.

Motivation:
Iceberg supports two approaches for row-level modifications:

  • Merge-on-Read (MoR): Fast writes, slower reads (current Trino behavior)
  • Copy-on-Write (CoW): Slower writes, fast reads (previously unsupported)

Users with read-heavy workloads on frequently updated tables have been requesting CoW support to eliminate read-time overhead of merging delete files with data files. This is particularly valuable for:

  • Analytics workloads with frequent small updates
  • Tables with high read/write ratios
  • Use cases requiring predictable read performance

Implementation:
This change adds three new table properties to control write mode per operation type:

  • write_delete_mode: 'merge-on-read' (default) or 'copy-on-write'
  • write_update_mode: 'merge-on-read' (default) or 'copy-on-write'
  • write_merge_mode: 'merge-on-read' (default) or 'copy-on-write'

Key changes:

  1. UpdateKind tracking: Added UpdateKind enum (DELETE/UPDATE/MERGE) to IcebergTableHandle to track operation type throughout query lifecycle

    • applyDelete() sets UpdateKind.DELETE
    • getUpdateLayout() sets UpdateKind.UPDATE
    • beginMerge() sets UpdateKind.MERGE
  2. UpdateMode resolution: Added UpdateMode enum (MERGE_ON_READ/COPY_ON_WRITE) to resolve write mode from table properties based on UpdateKind

  3. CoW DELETE implementation: Modified finishWrite() to detect CoW DELETE operations and delegate to new rewriteDataFilesForCowDelete() method

    • Reads manifests to locate DataFile objects for files to delete
    • For each affected data file:
      • Reads position deletes from delete files into Roaring64Bitmap
      • Opens original data file via IcebergPageSourceProvider * Filters deleted positions page-by-page using Block.copyPositions() * Writes filtered data to new file with proper metrics * Manages rollback lifecycle for cleanup on errors
    • Uses Iceberg's RewriteFiles API for atomic commit
  4. Resource management: Proper handling of rollback lifecycle

    • Rollback handle kept until file successfully added to transaction
    • Cleanup guaranteed via try-catch-finally blocks
    • Suppressed exceptions preserve original error context
  5. Optimizations:

    • Direct manifest reading with early termination when all files found
    • Efficient position delete filtering using Roaring64Bitmap
    • Reuses existing PositionDeleteFilter implementation for consistency
  6. Safety features:

    • Snapshot isolation via Iceberg's validation mechanisms
    • Null checks for empty tables
    • Missing file validation before commit
    • Format version check (requires v2 for row-level operations)

Testing:

  • Added TestIcebergCopyOnWriteOperations with 17 comprehensive tests
    • Basic operations (DELETE, UPDATE, MERGE)
    • Partitioned tables
    • Large batch operations (1000-5000 rows)
    • Performance benchmarks with metrics collection
    • Error cases (empty tables, format v1, etc.)
  • Added TestIcebergCopyOnWriteDeleteOperations with 15 unit tests
    • File rewriting with position deletes
    • Error handling (IO exceptions, missing files, etc.)
    • Resource cleanup verification
    • Edge cases (empty deletes, equality deletes, etc.)
  • Added TestIcebergCopyOnWriteIntegration with 5 integration tests
    • Resource cleanup on failure
    • Concurrent operations
    • Snapshot isolation
    • Conflict resolution
  • Updated 8 test files to reflect new IcebergMetadata constructor signature

Documentation:

  • Added comprehensive COPY_ON_WRITE_README.md with:
    • Feature overview and motivation
    • Configuration examples
    • Usage patterns and recommendations
    • Performance characteristics (CoW vs MoR comparison)
    • Implementation deep dive with code examples
    • Troubleshooting guide
    • Known limitations and future enhancements

Limitations:

  • CoW DELETE currently only supports position deletes
  • Equality deletes documented as future enhancement
  • Requires Iceberg format version 2 or higher

Additional context and related issues

Documented in README

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 23, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the iceberg Iceberg connector label Dec 23, 2025
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 23, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 23, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 24, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@nookcreed nookcreed marked this pull request as draft December 24, 2025 05:19
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@nookcreed nookcreed marked this pull request as ready for review December 30, 2025 05:25
@nookcreed nookcreed marked this pull request as draft December 30, 2025 05:25
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

5 similar comments
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Dec 30, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 4, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

2 similar comments
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 5, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 5, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 5, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 5, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 5, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

…tions

Fix NullPointerException in BaseTrinoCatalogTest

This commit fixes NullPointerException errors occurring in BaseTrinoCatalogTest
by providing non-null OrcReaderOptions and ParquetReaderOptions objects
in the test methods testNonLowercaseNamespace and testSchemaWithInvalidProperties.
Instead of passing null values, we now use default instances.

Fix tests

Fix tests

Empty commit
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 5, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Jan 5, 2026

Duplicate of #27844

@ebyhr ebyhr marked this as a duplicate of #27844 Jan 5, 2026
@ebyhr ebyhr closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

3 participants