-
Notifications
You must be signed in to change notification settings - Fork 270
fix: [iceberg] more fixes for Iceberg integration APIs. #2078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2078 +/- ##
============================================
+ Coverage 56.12% 58.35% +2.22%
- Complexity 976 1252 +276
============================================
Files 119 140 +21
Lines 11743 13322 +1579
Branches 2251 2370 +119
============================================
+ Hits 6591 7774 +1183
- Misses 4012 4317 +305
- Partials 1140 1231 +91 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Reverted some of the changes to allow ci to pass. |
|
|
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @parthchandra
|
|
||
| /** | ||
| * Wraps an InputStream that possibly implements the methods of a Parquet SeekableInputStream (but | ||
| * is not a Parquet SeekableInputStream). Such an InputStream exists, fir instance, in Iceberg's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) fir -> for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
|
||
| /** | ||
| * A Parquet {@link InputFile} implementation that's similar to {@link | ||
| * org.apache.parquet.hadoop.util.HadoopInputFile}, but with optimizations introduced in Hadoop 3.x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Hadoop 3.x part still valid b/c this class is not a CometInputFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Updated the comment.
hsiang-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No, but the input to The |
| } | ||
|
|
||
| return new ParquetColumnSpec( | ||
| 1, // ToDo: pass in the correct id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnDescriptor doesn't expose fieldId, but we need a placeholder here. I guess put -1 for unknown id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently one can but id may not be set. I've changed this to set the id from the descriptor else to -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao if this change looks good to you, I will merge this PR. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix!
|
Merged. Thank you for the review @andygrove @huaxingao @hsiang-c |
* fix: [iceberg] more fixes for Iceberg intergation APIs.
Which issue does this PR close?
When running iceberg unit and integration tests with Comet, we still encounter some issues due to Parquet shading. This PR addresses those issues.
The main changes are: For all classes in Comet that are used in Iceberg integration, make all constructors and methods package private if they have a Parquet class in the signature. Provide equivalent methods that use Parquet encapsulation classes.
Refactor the
FileReaderAPI to use anInputStream(instead of a file path) to allow implementations of Iceberg'sInputFile. The refactoring introduces some more encapsulation classes that use reflection to call back into Iceberg (to avoid adding a circular dependency on Iceberg)This PR may break the complementary (draft) PR in Iceberg: apache/iceberg#13378