-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use RDD.collect in Arrow code path
#38706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also remove this and protobuf definition
4c79820 to
ed59e6c
Compare
RDD.collect in Arrow code path
RDD.collect in Arrow code pathRDD.collect in Arrow code path
|
Merged to master. |
|
Sorry for late comment but just one question: does this implementation always send at least one partition to client even if there is empty result? |
will send at least one batch. |
|
@zhengruifeng thanks for this work! LGTM |
|
late LGTM |
…ollect` in Arrow code path ### What changes were proposed in this pull request? 1, Remove JSON code path; 2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path; 3, reenable `test_fill_na` ### Why are the changes needed? existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of `test_fill_na` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? reenabled test and added UT Closes apache#38706 from zhengruifeng/collect_disable_json. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ollect` in Arrow code path ### What changes were proposed in this pull request? 1, Remove JSON code path; 2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path; 3, reenable `test_fill_na` ### Why are the changes needed? existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of `test_fill_na` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? reenabled test and added UT Closes apache#38706 from zhengruifeng/collect_disable_json. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ollect` in Arrow code path ### What changes were proposed in this pull request? 1, Remove JSON code path; 2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path; 3, reenable `test_fill_na` ### Why are the changes needed? existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of `test_fill_na` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? reenabled test and added UT Closes apache#38706 from zhengruifeng/collect_disable_json. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
1, Remove JSON code path;
2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path;
3, reenable
test_fill_naWhy are the changes needed?
existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of
test_fill_naDoes this PR introduce any user-facing change?
No
How was this patch tested?
reenabled test and added UT