Make partitionValues serializable for IcebergSplit#28988
Make partitionValues serializable for IcebergSplit#28988chenjian2664 merged 4 commits intotrinodb:masterfrom
partitionValues serializable for IcebergSplit#28988Conversation
35d3c5f to
2dba594
Compare
|
Started benchmark workflow for this PR with test type =
|
2dba594 to
380cef3
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
partitionValues serializable for IcebergSplit
|
@raunaqmorarka PTAL |
📝 WalkthroughWalkthroughThis pull request refactors partition value representation in the Iceberg plugin, replacing JSON-based serialization with direct Trino Possibly related PRs
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.42.0)plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java (1)
210-217:⚠️ Potential issue | 🟡 MinorUse
estimatedSizeOfforpartitionValueshere.Line 216 only sums the blocks themselves, so
getRetainedSizeInBytes()misses thepartitionValueslist shell. That under-reports split memory in the same method that otherwise usesSizeOfhelpers for containers.🧮 Proposed fix
- + partitionValues.stream().map(Block::getRetainedSizeInBytes).mapToInt(Long::intValue).sum() + + estimatedSizeOf(partitionValues, Block::getRetainedSizeInBytes)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java` around lines 210 - 217, The getRetainedSizeInBytes method underreports memory because it sums only the Block sizes and omits the partitionValues list overhead; in IcebergSplit.getRetainedSizeInBytes replace the current partitionValues.stream().map(Block::getRetainedSizeInBytes).mapToInt(Long::intValue).sum() expression with estimatedSizeOf(partitionValues, Block::getRetainedSizeInBytes) so the list shell plus element sizes are included (use the existing estimatedSizeOf helper and the partitionValues symbol to locate the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java`:
- Around line 210-217: The getRetainedSizeInBytes method underreports memory
because it sums only the Block sizes and omits the partitionValues list
overhead; in IcebergSplit.getRetainedSizeInBytes replace the current
partitionValues.stream().map(Block::getRetainedSizeInBytes).mapToInt(Long::intValue).sum()
expression with estimatedSizeOf(partitionValues, Block::getRetainedSizeInBytes)
so the list shell plus element sizes are included (use the existing
estimatedSizeOf helper and the partitionValues symbol to locate the change).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fdb49aef-a257-46fb-9e17-d42d4682935a
📒 Files selected for processing (11)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergBucketFunction.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPartitioningHandle.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTablePartitioning.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionData.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionProcessor.javaplugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergNodeLocalDynamicSplitPruning.javaplugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergPageSourceProvider.java
💤 Files with no reviewable changes (2)
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionProcessor.java
- plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergPageSourceProvider.java
380cef3 to
ba9a047
Compare
raunaqmorarka
left a comment
There was a problem hiding this comment.
Review comments on partition value serialization changes.
9105dee to
7a8f919
Compare
Also removed the `partitionDataJson` from `IcebergSplit` since it is can be replaced by the new added `partitionValues`.
7a8f919 to
fec29cb
Compare
Description
The
ConnectorSplitserves as the data carrier between the coordinator and workers, and therefore should be fully serializable by design. MakingpartitionValuesserializable is the first step toward correcting the current implementation.Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: