[SPARK-29317][SQL][PYTHON] Avoid inheritance hierarchy in pandas CoGroup arrow runner and its plan #25989
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to avoid abstract classes introduced at #24965 but instead uses trait and object.
abstract class BaseArrowPythonRunner->trait PythonArrowOutputto allow mix-inBefore:
After:
abstract class BasePandasGroupExec->object PandasGroupUtilsto decoupleBefore:
After:
Why are the changes needed?
The problem is that R code path is being matched with Python side:
Python:
R:
I would like to match the hierarchy and decouple other stuff for now if possible. Ideally we should deduplicate both code paths. Internal implementation is also similar intentionally.
BasePandasGroupExeccase is similar as well. R (with Arrow optimization, in particular) has some duplicated codes with Pandas UDFs.FlatMapGroupsInRWithArrowExec<>FlatMapGroupsInPandasExecMapPartitionsInRWithArrowExec<>ArrowEvalPythonExecIn order to prepare deduplication here as well, it might better avoid changing hierarchy alone in Python side.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Locally tested existing tests. Jenkins tests should verify this too.