Skip to content

Make partitionValues serializable for IcebergSplit#28988

Merged
chenjian2664 merged 4 commits intotrinodb:masterfrom
chenjian2664:jack/replace-partition-values-iceberg-split
Apr 10, 2026
Merged

Make partitionValues serializable for IcebergSplit#28988
chenjian2664 merged 4 commits intotrinodb:masterfrom
chenjian2664:jack/replace-partition-values-iceberg-split

Conversation

@chenjian2664
Copy link
Copy Markdown
Contributor

Description

The ConnectorSplit serves as the data carrier between the coordinator and workers, and therefore should be fully serializable by design. Making partitionValues serializable is the first step toward correcting the current implementation.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 3, 2026
@github-actions github-actions bot added the iceberg Iceberg connector label Apr 3, 2026
Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java Outdated
@chenjian2664 chenjian2664 force-pushed the jack/replace-partition-values-iceberg-split branch 2 times, most recently from 35d3c5f to 2dba594 Compare April 6, 2026 10:45
@starburstdata-automation
Copy link
Copy Markdown

starburstdata-automation commented Apr 6, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

@chenjian2664 chenjian2664 force-pushed the jack/replace-partition-values-iceberg-split branch from 2dba594 to 380cef3 Compare April 6, 2026 12:39
@chenjian2664
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

✅ Actions performed

Full review triggered.

@chenjian2664 chenjian2664 changed the title Make partitionValues serializable Make partitionValues serializable for IcebergSplit Apr 6, 2026
@chenjian2664
Copy link
Copy Markdown
Contributor Author

@raunaqmorarka PTAL

@chenjian2664 chenjian2664 requested review from ebyhr and findinpath April 6, 2026 12:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

This pull request refactors partition value representation in the Iceberg plugin, replacing JSON-based serialization with direct Trino Block objects. Changes include replacing Optional<List<Object>> partitionValues and String partitionDataJson in IcebergSplit with a single List<Block> partitionValues field. The partitionStructFields metadata is moved from IcebergTablePartitioning to IcebergPartitioningHandle. A new factory method PartitionData.fromBlocks() is introduced to construct partition data directly from blocks, and partition value handling is updated across IcebergSplitSource, IcebergBucketFunction, and IcebergPageSourceProvider to work with the new block-based representation.

Possibly related PRs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.42.0)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java (1)

210-217: ⚠️ Potential issue | 🟡 Minor

Use estimatedSizeOf for partitionValues here.

Line 216 only sums the blocks themselves, so getRetainedSizeInBytes() misses the partitionValues list shell. That under-reports split memory in the same method that otherwise uses SizeOf helpers for containers.

🧮 Proposed fix
-                + partitionValues.stream().map(Block::getRetainedSizeInBytes).mapToInt(Long::intValue).sum()
+                + estimatedSizeOf(partitionValues, Block::getRetainedSizeInBytes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java`
around lines 210 - 217, The getRetainedSizeInBytes method underreports memory
because it sums only the Block sizes and omits the partitionValues list
overhead; in IcebergSplit.getRetainedSizeInBytes replace the current
partitionValues.stream().map(Block::getRetainedSizeInBytes).mapToInt(Long::intValue).sum()
expression with estimatedSizeOf(partitionValues, Block::getRetainedSizeInBytes)
so the list shell plus element sizes are included (use the existing
estimatedSizeOf helper and the partitionValues symbol to locate the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java`:
- Around line 210-217: The getRetainedSizeInBytes method underreports memory
because it sums only the Block sizes and omits the partitionValues list
overhead; in IcebergSplit.getRetainedSizeInBytes replace the current
partitionValues.stream().map(Block::getRetainedSizeInBytes).mapToInt(Long::intValue).sum()
expression with estimatedSizeOf(partitionValues, Block::getRetainedSizeInBytes)
so the list shell plus element sizes are included (use the existing
estimatedSizeOf helper and the partitionValues symbol to locate the change).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fdb49aef-a257-46fb-9e17-d42d4682935a

📥 Commits

Reviewing files that changed from the base of the PR and between 11bf3c0 and 380cef3.

📒 Files selected for processing (11)
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergBucketFunction.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPartitioningHandle.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTablePartitioning.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionData.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionProcessor.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergNodeLocalDynamicSplitPruning.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergPageSourceProvider.java
💤 Files with no reviewable changes (2)
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionProcessor.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergPageSourceProvider.java

@chenjian2664 chenjian2664 force-pushed the jack/replace-partition-values-iceberg-split branch from 380cef3 to ba9a047 Compare April 6, 2026 13:17
Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments on partition value serialization changes.

Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % minor comment

@chenjian2664 chenjian2664 force-pushed the jack/replace-partition-values-iceberg-split branch from 9105dee to 7a8f919 Compare April 10, 2026 09:28
@chenjian2664 chenjian2664 force-pushed the jack/replace-partition-values-iceberg-split branch from 7a8f919 to fec29cb Compare April 10, 2026 10:38
@chenjian2664 chenjian2664 merged commit 8300848 into trinodb:master Apr 10, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

4 participants