[SPARK-28878][SQL] Remove extra project for DSv2 reads with columnar batches#25586
[SPARK-28878][SQL] Remove extra project for DSv2 reads with columnar batches#25586rdblue wants to merge 1 commit intoapache:masterfrom
Conversation
|
Test build #109757 has finished for PR 25586 at commit
|
|
Retest this please. |
|
Test build #109828 has finished for PR 25586 at commit
|
|
@rdblue This looks very useful and pertinent to Vectorization support use-case in DSV2 and should help in further improving performance. Thanks for working on this! |
|
@prodeezy, I just needed to verify that there are tests for the columnar path and there are: |
|
@brkyvz, can you take a look at this? |
|
thanks, merging to master! |
|
|
||
| // always add the projection, which will produce unsafe rows required by some operators | ||
| ProjectExec(project, withFilter) :: Nil | ||
| val withProjection = if (withFilter.output != project || !batchExec.supportsColumnar) { |
There was a problem hiding this comment.
BTW, shall we do the same for streaming scan?
There was a problem hiding this comment.
I'm not as familiar with the streaming side. If we support columnar reads there, we probably can.
|
Thanks for merging this, @cloud-fan! |
…scan ### What changes were proposed in this pull request? Remove the project node if the streaming scan is columnar ### Why are the changes needed? This is a followup of #25586. Batch and streaming share the same DS v2 read API so both can support columnar reads. We should apply #25586 to streaming scan as well. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #25727 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…batches ### What changes were proposed in this pull request? Remove unnecessary physical projection added to ensure rows are `UnsafeRow` when the DSv2 scan is columnar. This is not needed because conversions are automatically added to convert from columnar operators to `UnsafeRow` when the next operator does not support columnar execution. ### Why are the changes needed? Removes an extra projection and copy. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes apache#25586 from rdblue/SPARK-28878-remove-dsv2-project-with-columnar. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…scan ### What changes were proposed in this pull request? Remove the project node if the streaming scan is columnar ### Why are the changes needed? This is a followup of apache#25586. Batch and streaming share the same DS v2 read API so both can support columnar reads. We should apply apache#25586 to streaming scan as well. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes apache#25727 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Remove unnecessary physical projection added to ensure rows are
UnsafeRowwhen the DSv2 scan is columnar. This is not needed because conversions are automatically added to convert from columnar operators toUnsafeRowwhen the next operator does not support columnar execution.Why are the changes needed?
Removes an extra projection and copy.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.