Skip to content

[presto][iceberg] Wire dataSequenceNumber through protocol layer for equality delete conflict resolution#27392

Closed
apurva-meta wants to merge 2 commits intoprestodb:masterfrom
apurva-meta:export-D97531547
Closed

[presto][iceberg] Wire dataSequenceNumber through protocol layer for equality delete conflict resolution#27392
apurva-meta wants to merge 2 commits intoprestodb:masterfrom
apurva-meta:export-D97531547

Conversation

@apurva-meta
Copy link
Copy Markdown
Contributor

@apurva-meta apurva-meta commented Mar 21, 2026

Summary:
Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:

  • Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
  • Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
    through HiveIcebergSplit to IcebergSplitReader
  • Add const qualifiers to local variables for code clarity

Differential Revision: D97531547

@apurva-meta apurva-meta requested review from a team, ZacBlanco and hantangwangd as code owners March 21, 2026 06:43
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 21, 2026

Reviewer's Guide

Wires Iceberg delete file dataSequenceNumber through the Presto Java protocol into the C++ Velox connector, extends the protocol serialization, and propagates it into HiveIcebergSplit so Velox can perform server-side sequence-number-based delete conflict resolution for equality deletes, while also adding the missing equality-delete FileContent mapping and minor const/enum style cleanups.

Sequence diagram for wiring dataSequenceNumber through Iceberg split handling

sequenceDiagram
  actor Coordinator
  participant PrestoJava as PrestoJava_IcebergConnector
  participant DeleteFileJava as DeleteFileJava
  participant ProtoDeleteFile as DeleteFileProtocol
  participant Connector as IcebergPrestoToVeloxConnector
  participant HiveSplit as HiveIcebergSplit
  participant Reader as IcebergSplitReader

  Coordinator->>PrestoJava: Plan query with Iceberg table
  PrestoJava->>DeleteFileJava: fromIceberg(org.apache.iceberg.DeleteFile)
  DeleteFileJava-->>PrestoJava: DeleteFileJava{dataSequenceNumber}
  PrestoJava->>ProtoDeleteFile: Serialize DeleteFileJava to JSON including dataSequenceNumber
  ProtoDeleteFile-->>Connector: Deserialize JSON to DeleteFileProtocol{dataSequenceNumber}

  Connector->>Connector: toVeloxFileContent(FileContent.EQUALITY_DELETES) -> kEqualityDeletes
  Connector->>Connector: Read icebergSplit.dataSequenceNumber into local dataSequenceNumber
  Connector->>HiveSplit: Construct HiveIcebergSplit(..., deletes{DeleteFileProtocol.dataSequenceNumber}, infoColumns{"$data_sequence_number"=dataSequenceNumber}, ..., dataSequenceNumber)
  HiveSplit-->>Reader: Provide dataSequenceNumber and deleteFiles for conflict resolution
  Reader->>Reader: Apply equality delete conflict rules using dataSequenceNumber
Loading

Updated class diagram for Iceberg delete file dataSequenceNumber propagation

classDiagram
  class DeleteFileJava {
    - FileContent content
    - String path
    - FileFormat format
    - int specId
    - long recordCount
    - long fileSizeInBytes
    - List~Integer~ equalityFieldIds
    - Map~Integer, byte[]~ lowerBounds
    - Map~Integer, byte[]~ upperBounds
    - long dataSequenceNumber
    + fromIceberg(deleteFile)
    + getContent() FileContent
    + getPath() String
    + getFormat() FileFormat
    + getSpecId() int
    + getRecordCount() long
    + getFileSizeInBytes() long
    + getEqualityFieldIds() List~Integer~
    + getLowerBounds() Map~Integer, byte[]~
    + getUpperBounds() Map~Integer, byte[]~
    + getDataSequenceNumber() long
    + toString() String
  }

  class FileContentEnum {
    <<enum>>
    DATA
    POSITION_DELETES
    EQUALITY_DELETES
  }

  class DeleteFileProtocol {
    + List~Integer~ equalityFieldIds
    + Map~Integer, String~ lowerBounds
    + Map~Integer, String~ upperBounds
    + int64_t dataSequenceNumber
    + to_json(j, p)
    + from_json(j, p)
  }

  class IcebergPrestoToVeloxConnector {
    + toVeloxFileContent(content) FileContent
    + toVeloxSplit(connectorSplit, splitContext) unique_ptr~HiveIcebergSplit~
  }

  class HiveIcebergSplit {
    + HiveIcebergSplit(
        ..., 
        vector~VeloxDeleteFile~ deleteFiles,
        unordered_map~string, string~ infoColumns,
        optional~string~ deletionVectorPath,
        int64_t dataSequenceNumber
      )
  }

  class VeloxDeleteFile {
    + VeloxDeleteFile(
        FileContent content,
        string path,
        FileFormat format,
        int32_t specId,
        int64_t recordCount,
        int64_t fileSizeInBytes,
        vector~int32_t~ equalityFieldIds,
        map~int32_t, string~ lowerBounds,
        map~int32_t, string~ upperBounds,
        int64_t dataSequenceNumber
      )
  }

  DeleteFileJava --> FileContentEnum : uses
  DeleteFileProtocol --> FileContentEnum : uses
  IcebergPrestoToVeloxConnector --> DeleteFileProtocol : reads delete metadata
  IcebergPrestoToVeloxConnector --> VeloxDeleteFile : constructs with dataSequenceNumber
  IcebergPrestoToVeloxConnector --> HiveIcebergSplit : constructs with dataSequenceNumber
  HiveIcebergSplit --> VeloxDeleteFile : aggregates deleteFiles
Loading

File-Level Changes

Change Details Files
Add dataSequenceNumber to the Java DeleteFile model so it can be serialized from the Iceberg library into the Presto protocol payloads.
  • Extend DeleteFile with a new dataSequenceNumber field captured from org.apache.iceberg.DeleteFile, defaulting nulls to 0L for legacy files.
  • Update the @JsonCreator constructor and JSON properties to include dataSequenceNumber in both deserialization and serialization.
  • Include dataSequenceNumber in DeleteFile.toString() for easier debugging/logging.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/delete/DeleteFile.java
Propagate delete-file dataSequenceNumber through the C++ Presto protocol structs and JSON bindings so splits carry this field end-to-end.
  • Extend the protocol::iceberg::DeleteFile struct to include an int64_t dataSequenceNumber member with a default value.
  • Update to_json/from_json helpers to read and write the dataSequenceNumber field with appropriate metadata.
  • Keep the FileContent enum but reformat it multi-line without changing semantics.
presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.h
presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp
Use the propagated dataSequenceNumber in the IcebergPrestoToVeloxConnector when constructing Velox Iceberg delete files and splits, and add mapping for equality delete content.
  • Add handling for protocol::iceberg::FileContent::EQUALITY_DELETES in toVeloxFileContent, mapping it to velox::connector::hive::iceberg::FileContent::kEqualityDeletes.
  • Read icebergSplit->dataSequenceNumber into a const local and use it consistently when populating infoColumns and constructing the HiveIcebergSplit.
  • Pass the new deleteFile.dataSequenceNumber from the protocol DeleteFile into the Velox IcebergDeleteFile constructor, extending its argument list accordingly.
  • Extend the HiveIcebergSplit construction call to pass the optional dataSequenceNumber parameter (std::nullopt plus dataSequenceNumber) to align with the updated split interface.
presto-native-execution/presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp

Possibly linked issues

  • #native(Iceberg): The PR introduces Iceberg V3 DV support and delete conflict resolution, directly implementing the issue’s requested V3 delete behavior.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@meta-codesync meta-codesync bot changed the title feat:[presto][iceberg] Wire dataSequenceNumber through protocol layer for equality delete conflict resolution feat:[presto][iceberg] Wire dataSequenceNumber through protocol layer for equality delete conflict resolution (#27392) Mar 21, 2026
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 21, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
@apurva-meta apurva-meta force-pushed the export-D97531547 branch 2 times, most recently from cdd01c2 to bc2c773 Compare March 21, 2026 06:52
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 21, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 21, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
…tensibility

Summary:
- Reformat FileContent enum in presto_protocol_iceberg.h from single-line
  to multi-line for better readability and future extension.
- Add blank line for visual separation before infoColumns initialization.

Protocol files are auto-generated from Java sources via chevron. The manual
edits here mirror what the generator would produce once the Java changes
are landed and the protocol is regenerated.

Differential Revision: D97531548
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 27, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 27, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 27, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
apurva-meta added a commit to apurva-meta/presto that referenced this pull request Mar 27, 2026
… for equality delete conflict resolution (prestodb#27392)

Summary:

Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

== RELEASE NOTES ==
General Changes
* Upgrade Apache Iceberg library from 1.10.0 to 1.10.1.
Hive Connector Changes
* Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure.
* Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips.
* Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types.
* Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation.
* Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance.
* Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables.
* Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables.
* Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution.
* Improve IcebergSplitReader error handling and fix test file handle leaks.
* Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries.

Differential Revision: D97531547
…equality delete conflict resolution

Summary:
Wire the dataSequenceNumber field from the Java Presto protocol to the
C++ Velox connector layer, enabling server-side sequence number conflict
resolution for equality delete files.

Changes:
- Add dataSequenceNumber field to IcebergSplit protocol (Java + C++)
- Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it
  through HiveIcebergSplit to IcebergSplitReader
- Add const qualifiers to local variables for code clarity

Differential Revision: D97531547
@meta-codesync meta-codesync bot changed the title feat:[presto][iceberg] Wire dataSequenceNumber through protocol layer for equality delete conflict resolution (#27392) [presto][iceberg] Wire dataSequenceNumber through protocol layer for equality delete conflict resolution Mar 27, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

CLA Missing ID CLA Not Signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant