Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Nov 15, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@viirya viirya changed the title Native scan Hook DataFusion Parquet native scan with Comet execution Nov 15, 2024
@viirya viirya changed the title Hook DataFusion Parquet native scan with Comet execution feat: Hook DataFusion Parquet native scan with Comet execution Nov 15, 2024
@viirya
Copy link
Member Author

viirya commented Nov 15, 2024

Basic query is okay now.

But this doesn't completely work for all queries right now. It still gets a few test failures in CometExecSuite. I'm looking into that.

|""".stripMargin
}

lazy val inputRDD: RDD[InternalRow] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return an rdd here (perhaps CometExecRDD) because in CometNativeExec we will call executeColumnar which will use this?

Copy link
Member Author

@viirya viirya Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't call executeColumnar on CometNativeScanExec anymore. If you run it, it runs the original JVM frontend instead of native scan. It is similar to other native operators.

let num_rows = output_batch.num_rows();

if results.len() != num_cols {
if results.len() < num_cols {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we now get more output columns (i.e., results.len() > num_cols) than before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A failed query has no output on the scan. I.e., the query asks for empty column. But the native scan still outputs all columns. It should be more reasonable to output no column, actually. Let me see if it could be fixed at the native scan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this back after pushing empty projection to the native scan.

@viirya
Copy link
Member Author

viirya commented Nov 19, 2024

A few of test failures in CometExecSuite are due to partition values are not handled for now. It requires some more changes to this branch. I will go to work on it in other PR.

@viirya
Copy link
Member Author

viirya commented Nov 19, 2024

cc @andygrove

@viirya viirya merged commit 1cca8d6 into apache:comet-parquet-exec Nov 19, 2024
21 of 74 checks passed
@viirya
Copy link
Member Author

viirya commented Nov 19, 2024

Thanks @mbutrovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants