fix: clean up [iceberg] integration APIs #2032

huaxingao · 2025-07-16T17:28:48Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

codecov-commenter · 2025-07-16T21:02:59Z

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Project coverage is 58.72%. Comparing base (f09f8af) to head (af0725d).
Report is 331 commits behind head on main.

Files with missing lines	Patch %	Lines
.../apache/comet/parquet/IcebergCometBatchReader.java	0.00%	10 Missing ⚠️
...ain/java/org/apache/comet/parquet/BatchReader.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2032      +/-   ##
============================================
+ Coverage     56.12%   58.72%   +2.59%     
- Complexity      976     1253     +277     
============================================
  Files           119      136      +17     
  Lines         11743    13147    +1404     
  Branches       2251     2390     +139     
============================================
+ Hits           6591     7720    +1129     
- Misses         4012     4195     +183     
- Partials       1140     1232      +92

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

parthchandra · 2025-07-17T00:22:17Z

common/src/main/java/org/apache/comet/parquet/BatchReader.java

-  public BatchReader(AbstractColumnReader[] columnReaders) {
-    // Todo: set useDecimal128 and useLazyMaterialization
-    int numColumns = columnReaders.length;
+  public BatchReader(int numColumns) {


Where are the column readers created and how are they passed in to the BatchReader? (Or does init have to be called?)
(Also, this constructor sets isInitialized to true which is probably not the case any more.)

@parthchandra Thanks for taking a look at this PR!
The column readers are created here and passed to Batch Reader at here
isInitialized is still true.

passed to Batch Reader at here

This does not sound correct. BatchReader has no API to set a column reader other than the constructor changed by this PR. The only other way to set column readers is by calling init which will then create the appropriate column readers.
Also, I notice that the current version of the constructor ignores the column readers passed in.
Any BatchReader created with this constructor (either this PR or the current version) will have an array of nulls as the column readers.

How is such a BatchReader usable (or useful)?

Thanks @huaxingao for pointing out that the column readers are being set here -
https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/CometColumnarBatchReader.java#L90
I would recommend an init API that takes the column readers as parameter and sets the initialization flag to true.
Essentially, there will be an init method for this variant of the constructor to mirror the init method used by with Spark

@parthchandra Updated. Could you please check one more time?

kazuyukitanimura

Looks good
I will add [iceberg] to run iceberg CI

huaxingao · 2025-07-22T00:43:05Z

Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft PR.

parthchandra · 2025-07-22T01:11:03Z

Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft PR.

Maybe reintroduce the methods you have removed so that CI can pass and mark them deprecated. We can remove them after the Iceberg PR is merged.

parthchandra · 2025-07-22T01:12:00Z

common/src/main/java/org/apache/comet/parquet/BatchReader.java

-  public AbstractColumnReader[] getColumnReaders() {
-    return columnReaders;
+  // Used by Iceberg only.
+  public void setSparkSchema(StructType schema) {


This can be. part of the new constructor?

parthchandra · 2025-07-22T01:15:00Z

common/src/main/java/org/apache/comet/parquet/BatchReader.java

-  public void setSparkSchema(StructType schema) {
-    this.sparkSchema = schema;
+  // Used by Iceberg only.
+  public void initByIceberg(AbstractColumnReader[] columnReaders) {


Just init?
It would be clearer if we simply created a new class extending BatchReader that the Iceberg specific methods are in. Leaves no room for confusion.

Fixed. Thanks

huaxingao · 2025-07-22T16:47:32Z

@parthchandra I have put back the methods and marked them deprecated. Could you please take one more look?

comphead

lgtm thanks @huaxingao

parthchandra · 2025-07-23T16:42:42Z

common/src/main/java/org/apache/comet/parquet/BatchReader.java


  /** The TaskContext object for executing this task. */
-  private final TaskContext taskContext;
+  protected TaskContext taskContext;


This does not need to be protected. It's not used by the derived class (and it is specifically for testing).
(It can also be initialized in the default constructor if one really needs it)

Changed back to private

parthchandra

One more comment, otherwise, lgtm.

kazuyukitanimura · 2025-07-24T16:18:11Z

Merged, thanks @huaxingao @comphead @parthchandra

huaxingao · 2025-07-24T16:55:06Z

Thank you all!

huaxingao added 2 commits July 16, 2025 10:26

fix: clean up iceberg integrtion APIs

8cdda16

formatting

97cb04b

parthchandra reviewed Jul 17, 2025

View reviewed changes

kazuyukitanimura approved these changes Jul 21, 2025

View reviewed changes

kazuyukitanimura changed the title ~~fix: clean up iceberg integration APIs~~ fix: clean up [iceberg] integration APIs Jul 21, 2025

add setColumnReaders

37642e2

huaxingao changed the title ~~fix: clean up [iceberg] integration APIs~~ [Iceberg] fix: clean up [iceberg] integration APIs Jul 21, 2025

huaxingao changed the title ~~[Iceberg] fix: clean up [iceberg] integration APIs~~ fix: clean up [iceberg] integration APIs Jul 21, 2025

parthchandra reviewed Jul 22, 2025

View reviewed changes

address comments

5e6552c

huaxingao force-pushed the fix branch from 871d088 to 5e6552c Compare July 22, 2025 06:44

huaxingao closed this Jul 22, 2025

huaxingao reopened this Jul 22, 2025

comphead approved these changes Jul 22, 2025

View reviewed changes

parthchandra reviewed Jul 23, 2025

View reviewed changes

parthchandra approved these changes Jul 23, 2025

View reviewed changes

address comments

af0725d

kazuyukitanimura merged commit 320ce55 into apache:main Jul 24, 2025
92 checks passed

huaxingao deleted the fix branch July 24, 2025 16:55

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025

fix: clean up [iceberg] integration APIs (apache#2032)

98eb8db

fix: clean up [iceberg] integration APIs #2032

fix: clean up [iceberg] integration APIs #2032

Uh oh!

Conversation

huaxingao commented Jul 16, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Jul 22, 2025

Uh oh!

parthchandra commented Jul 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Jul 22, 2025

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kazuyukitanimura commented Jul 24, 2025

Uh oh!

huaxingao commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Jul 16, 2025 •

edited

Loading