-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix incorrect results when reading migrated Iceberg Avro files #26866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR fixes incorrect results when reading migrated Iceberg Avro files by propagating partitionKeys into the Avro page source and updating partition column processing to deserialize correct values, and it extends the migration procedure tests to cover Avro format. Sequence diagram for Avro page source creation with partitionKeyssequenceDiagram
participant Provider as IcebergPageSourceProvider
participant PageSource as ReaderPageSourceWithRowPositions
participant Column as IcebergColumnHandle
Provider->>PageSource: createAvroPageSource(..., columns, partitionKeys)
loop For each column in columns
PageSource->>Column: getId()
PageSource->>partitionKeys: containsKey(column.getId())
alt partitionKeys contains column id
PageSource->>Column: getType(), getName()
PageSource->>PageSource: deserializePartitionValue(...)
PageSource->>PageSource: nativeValueToBlock(...)
PageSource->>PageSource: transforms.constantValue(...)
else column.isPartitionColumn()
PageSource->>PageSource: nativeValueToBlock(...)
PageSource->>PageSource: transforms.constantValue(...)
else column.isPathColumn()
PageSource->>PageSource: transforms.constantValue(...)
end
end
Class diagram for updated Avro page source creationclassDiagram
class IcebergPageSourceProvider {
+createDataPageSource(...)
}
class ReaderPageSourceWithRowPositions
class InputFile
class NameMapping
class IcebergColumnHandle {
+getId()
+getType()
+getName()
+isPartitionColumn()
+isPathColumn()
}
IcebergPageSourceProvider --> ReaderPageSourceWithRowPositions : creates
ReaderPageSourceWithRowPositions --> InputFile : uses
ReaderPageSourceWithRowPositions --> NameMapping : uses
ReaderPageSourceWithRowPositions --> IcebergColumnHandle : uses
ReaderPageSourceWithRowPositions : -Map<Integer, Optional<String>> partitionKeys
ReaderPageSourceWithRowPositions : +createAvroPageSource(..., List<IcebergColumnHandle> columns, Map<Integer, Optional<String>> partitionKeys)
IcebergColumnHandle --> "partitionKeys : Map<Integer, Optional<String>>"
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/procedure/TestIcebergMigrateProcedure.java:306-307` </location>
<code_context>
String icebergTableName = "iceberg.tpch." + tableName;
- assertUpdate("CREATE TABLE " + hiveTableName + " WITH (partitioned_by = ARRAY['part_col']) AS SELECT 1 id, 'part1' part_col", 1);
+ assertUpdate("CREATE TABLE " + hiveTableName + " WITH (format = '" + format + "', partitioned_by = ARRAY['part_col']) AS SELECT 1 id, 'part1' part_col", 1);
assertQueryFails("SELECT * FROM " + icebergTableName, "Not an Iceberg table: .*");
</code_context>
<issue_to_address>
**suggestion (testing):** Suggestion to add assertions for correctness of migrated data, especially for partition columns.
Consider adding assertions to verify that partition column values are correctly migrated, ensuring the fix is fully validated.
```suggestion
assertUpdate("CREATE TABLE " + hiveTableName + " WITH (format = '" + format + "', partitioned_by = ARRAY['part_col']) AS SELECT 1 id, 'part1' part_col", 1);
assertQueryFails("SELECT * FROM " + icebergTableName, "Not an Iceberg table: .*");
// Migrate the Hive table to Iceberg
assertUpdate("CALL iceberg.system.migrate('" + hiveTableName + "')");
// Assert that the migrated Iceberg table contains the correct partition column value
assertQuery("SELECT part_col FROM " + icebergTableName, "VALUES 'part1'");
// Optionally, assert the full row
assertQuery("SELECT id, part_col FROM " + icebergTableName, "VALUES (1, 'part1')");
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Description
Fixes #26863
Release notes
Summary by Sourcery
Fix partition key handling in Avro page source for migrated Iceberg files and extend migration tests to cover Avro and ORC formats
Bug Fixes:
Tests: