-
Notifications
You must be signed in to change notification settings - Fork 268
fix: clean up [iceberg] integration APIs #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2032 +/- ##
============================================
+ Coverage 56.12% 58.72% +2.59%
- Complexity 976 1253 +277
============================================
Files 119 136 +17
Lines 11743 13147 +1404
Branches 2251 2390 +139
============================================
+ Hits 6591 7720 +1129
- Misses 4012 4195 +183
- Partials 1140 1232 +92 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| public BatchReader(AbstractColumnReader[] columnReaders) { | ||
| // Todo: set useDecimal128 and useLazyMaterialization | ||
| int numColumns = columnReaders.length; | ||
| public BatchReader(int numColumns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the column readers created and how are they passed in to the BatchReader? (Or does init have to be called?)
(Also, this constructor sets isInitialized to true which is probably not the case any more.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthchandra Thanks for taking a look at this PR!
The column readers are created here and passed to Batch Reader at here
isInitialized is still true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passed to Batch Reader at here
This does not sound correct. BatchReader has no API to set a column reader other than the constructor changed by this PR. The only other way to set column readers is by calling init which will then create the appropriate column readers.
Also, I notice that the current version of the constructor ignores the column readers passed in.
Any BatchReader created with this constructor (either this PR or the current version) will have an array of nulls as the column readers.
How is such a BatchReader usable (or useful)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @huaxingao for pointing out that the column readers are being set here -
https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/CometColumnarBatchReader.java#L90
I would recommend an init API that takes the column readers as parameter and sets the initialization flag to true.
Essentially, there will be an init method for this variant of the constructor to mirror the init method used by with Spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthchandra Updated. Could you please check one more time?
kazuyukitanimura
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
I will add [iceberg] to run iceberg CI
|
Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft PR. |
Maybe reintroduce the methods you have removed so that CI can pass and mark them deprecated. We can remove them after the Iceberg PR is merged. |
| public AbstractColumnReader[] getColumnReaders() { | ||
| return columnReaders; | ||
| // Used by Iceberg only. | ||
| public void setSparkSchema(StructType schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be. part of the new constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| public void setSparkSchema(StructType schema) { | ||
| this.sparkSchema = schema; | ||
| // Used by Iceberg only. | ||
| public void initByIceberg(AbstractColumnReader[] columnReaders) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just init?
It would be clearer if we simply created a new class extending BatchReader that the Iceberg specific methods are in. Leaves no room for confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
|
@parthchandra I have put back the methods and marked them deprecated. Could you please take one more look? |
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @huaxingao
|
|
||
| /** The TaskContext object for executing this task. */ | ||
| private final TaskContext taskContext; | ||
| protected TaskContext taskContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be protected. It's not used by the derived class (and it is specifically for testing).
(It can also be initialized in the default constructor if one really needs it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed back to private
parthchandra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment, otherwise, lgtm.
|
Merged, thanks @huaxingao @comphead @parthchandra |
|
Thank you all! |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?