Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Nov 18, 2022

What changes were proposed in this pull request?

1, Remove JSON code path;
2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path;
3, reenable test_fill_na

Why are the changes needed?

existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of test_fill_na

Does this PR introduce any user-facing change?

No

How was this patch tested?

reenabled test and added UT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also remove this and protobuf definition

@zhengruifeng zhengruifeng changed the title [TEST ONLY] Come back to collect.foreach(send) [SPARK-41005][COLLECT][FOLLOWUP] Come back to collect.foreach(send) Nov 21, 2022
@zhengruifeng zhengruifeng changed the title [SPARK-41005][COLLECT][FOLLOWUP] Come back to collect.foreach(send) [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and reuse RDD.collect in Arrow code path Nov 21, 2022
@zhengruifeng zhengruifeng changed the title [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and reuse RDD.collect in Arrow code path [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use RDD.collect in Arrow code path Nov 21, 2022
@zhengruifeng zhengruifeng marked this pull request as ready for review November 21, 2022 02:41
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the collect_disable_json branch November 22, 2022 00:45
@amaliujia
Copy link
Contributor

Sorry for late comment but just one question:

does this implementation always send at least one partition to client even if there is empty result?

@zhengruifeng
Copy link
Contributor Author

Sorry for late comment but just one question:

does this implementation always send at least one partition to client even if there is empty result?

will send at least one batch.

@amaliujia
Copy link
Contributor

@zhengruifeng thanks for this work! LGTM

@cloud-fan
Copy link
Contributor

late LGTM

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…ollect` in Arrow code path

### What changes were proposed in this pull request?
1, Remove JSON code path;
2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path;
3, reenable `test_fill_na`

### Why are the changes needed?
existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of `test_fill_na`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
reenabled test and added UT

Closes apache#38706 from zhengruifeng/collect_disable_json.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
…ollect` in Arrow code path

### What changes were proposed in this pull request?
1, Remove JSON code path;
2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path;
3, reenable `test_fill_na`

### Why are the changes needed?
existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of `test_fill_na`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
reenabled test and added UT

Closes apache#38706 from zhengruifeng/collect_disable_json.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…ollect` in Arrow code path

### What changes were proposed in this pull request?
1, Remove JSON code path;
2, use RDD.collect in Arrow code path, since existing tests were already broken in Arrow code path;
3, reenable `test_fill_na`

### Why are the changes needed?
existing Arrow code path is still problematic and it fails and fallback to JSON code path, which change the output datatypes of `test_fill_na`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
reenabled test and added UT

Closes apache#38706 from zhengruifeng/collect_disable_json.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants