Skip to content

Conversation

@huaxingao
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2025

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Project coverage is 58.72%. Comparing base (f09f8af) to head (af0725d).
Report is 331 commits behind head on main.

Files with missing lines Patch % Lines
.../apache/comet/parquet/IcebergCometBatchReader.java 0.00% 10 Missing ⚠️
...ain/java/org/apache/comet/parquet/BatchReader.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2032      +/-   ##
============================================
+ Coverage     56.12%   58.72%   +2.59%     
- Complexity      976     1253     +277     
============================================
  Files           119      136      +17     
  Lines         11743    13147    +1404     
  Branches       2251     2390     +139     
============================================
+ Hits           6591     7720    +1129     
- Misses         4012     4195     +183     
- Partials       1140     1232      +92     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

public BatchReader(AbstractColumnReader[] columnReaders) {
// Todo: set useDecimal128 and useLazyMaterialization
int numColumns = columnReaders.length;
public BatchReader(int numColumns) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the column readers created and how are they passed in to the BatchReader? (Or does init have to be called?)
(Also, this constructor sets isInitialized to true which is probably not the case any more.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthchandra Thanks for taking a look at this PR!
The column readers are created here and passed to Batch Reader at here
isInitialized is still true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passed to Batch Reader at here

This does not sound correct. BatchReader has no API to set a column reader other than the constructor changed by this PR. The only other way to set column readers is by calling init which will then create the appropriate column readers.
Also, I notice that the current version of the constructor ignores the column readers passed in.
Any BatchReader created with this constructor (either this PR or the current version) will have an array of nulls as the column readers.

How is such a BatchReader usable (or useful)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huaxingao for pointing out that the column readers are being set here -
https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/CometColumnarBatchReader.java#L90
I would recommend an init API that takes the column readers as parameter and sets the initialization flag to true.
Essentially, there will be an init method for this variant of the constructor to mirror the init method used by with Spark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthchandra Updated. Could you please check one more time?

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good
I will add [iceberg] to run iceberg CI

@kazuyukitanimura kazuyukitanimura changed the title fix: clean up iceberg integration APIs fix: clean up [iceberg] integration APIs Jul 21, 2025
@huaxingao huaxingao changed the title fix: clean up [iceberg] integration APIs [Iceberg] fix: clean up [iceberg] integration APIs Jul 21, 2025
@huaxingao huaxingao changed the title [Iceberg] fix: clean up [iceberg] integration APIs fix: clean up [iceberg] integration APIs Jul 21, 2025
@huaxingao
Copy link
Contributor Author

Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft PR.

@parthchandra
Copy link
Contributor

Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft PR.

Maybe reintroduce the methods you have removed so that CI can pass and mark them deprecated. We can remove them after the Iceberg PR is merged.

public AbstractColumnReader[] getColumnReaders() {
return columnReaders;
// Used by Iceberg only.
public void setSparkSchema(StructType schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be. part of the new constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public void setSparkSchema(StructType schema) {
this.sparkSchema = schema;
// Used by Iceberg only.
public void initByIceberg(AbstractColumnReader[] columnReaders) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just init?
It would be clearer if we simply created a new class extending BatchReader that the Iceberg specific methods are in. Leaves no room for confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks

@huaxingao
Copy link
Contributor Author

@parthchandra I have put back the methods and marked them deprecated. Could you please take one more look?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @huaxingao


/** The TaskContext object for executing this task. */
private final TaskContext taskContext;
protected TaskContext taskContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be protected. It's not used by the derived class (and it is specifically for testing).
(It can also be initialized in the default constructor if one really needs it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to private

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment, otherwise, lgtm.

@kazuyukitanimura kazuyukitanimura merged commit 320ce55 into apache:main Jul 24, 2025
92 checks passed
@kazuyukitanimura
Copy link
Contributor

Merged, thanks @huaxingao @comphead @parthchandra

@huaxingao
Copy link
Contributor Author

Thank you all!

@huaxingao huaxingao deleted the fix branch July 24, 2025 16:55
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants