-
Notifications
You must be signed in to change notification settings - Fork 265
feat: Hook DataFusion Parquet native scan with Comet execution #1094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Basic query is okay now. But this doesn't completely work for all queries right now. It still gets a few test failures in CometExecSuite. I'm looking into that. |
| |""".stripMargin | ||
| } | ||
|
|
||
| lazy val inputRDD: RDD[InternalRow] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we return an rdd here (perhaps CometExecRDD) because in CometNativeExec we will call executeColumnar which will use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we won't call executeColumnar on CometNativeScanExec anymore. If you run it, it runs the original JVM frontend instead of native scan. It is similar to other native operators.
native/core/src/execution/jni_api.rs
Outdated
| let num_rows = output_batch.num_rows(); | ||
|
|
||
| if results.len() != num_cols { | ||
| if results.len() < num_cols { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we now get more output columns (i.e., results.len() > num_cols) than before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A failed query has no output on the scan. I.e., the query asks for empty column. But the native scan still outputs all columns. It should be more reasonable to output no column, actually. Let me see if it could be fixed at the native scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this back after pushing empty projection to the native scan.
|
A few of test failures in CometExecSuite are due to partition values are not handled for now. It requires some more changes to this branch. I will go to work on it in other PR. |
|
cc @andygrove |
|
Thanks @mbutrovich |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?