[SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow` #40350

xinrong-meng · 2023-03-09T09:04:40Z

What changes were proposed in this pull request?

Implement DataFrame.mapInArrow.

Why are the changes needed?

Parity with vanilla PySpark.

Does this PR introduce any user-facing change?

Yes. DataFrame.mapInArrow is supported as shown below.

>>> import pyarrow
>>> df = spark.createDataFrame([(1, 21), (2, 30)], ("id", "age"))
>>> def filter_func(iterator):
...   for batch in iterator:
...     pdf = batch.to_pandas()
...     yield pyarrow.RecordBatch.from_pandas(pdf[pdf.id == 1])
... 
>>> df.mapInArrow(filter_func, df.schema).show()
+---+---+                                                                       
| id|age|
+---+---+
|  1| 21|
+---+---+

How was this patch tested?

Unit tests.

SPARK-41661

HyukjinKwon · 2023-03-10T01:08:30Z

The test failure seems unrelated.

Merged to master and branch-3.4.

### What changes were proposed in this pull request? Implement `DataFrame.mapInArrow`. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `DataFrame.mapInArrow` is supported as shown below. ``` >>> import pyarrow >>> df = spark.createDataFrame([(1, 21), (2, 30)], ("id", "age")) >>> def filter_func(iterator): ... for batch in iterator: ... pdf = batch.to_pandas() ... yield pyarrow.RecordBatch.from_pandas(pdf[pdf.id == 1]) ... >>> df.mapInArrow(filter_func, df.schema).show() +---+---+ | id|age| +---+---+ | 1| 21| +---+---+ ``` ### How was this patch tested? Unit tests. Closes #40350 from xinrong-meng/mapInArrowImpl. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit f35c2cb) Signed-off-by: Hyukjin Kwon <[email protected]>

xinrong-meng · 2023-03-10T03:16:05Z

Thanks @HyukjinKwon !

### What changes were proposed in this pull request? Implement `DataFrame.mapInArrow`. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `DataFrame.mapInArrow` is supported as shown below. ``` >>> import pyarrow >>> df = spark.createDataFrame([(1, 21), (2, 30)], ("id", "age")) >>> def filter_func(iterator): ... for batch in iterator: ... pdf = batch.to_pandas() ... yield pyarrow.RecordBatch.from_pandas(pdf[pdf.id == 1]) ... >>> df.mapInArrow(filter_func, df.schema).show() +---+---+ | id|age| +---+---+ | 1| 21| +---+---+ ``` ### How was this patch tested? Unit tests. Closes apache#40350 from xinrong-meng/mapInArrowImpl. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit f35c2cb) Signed-off-by: Hyukjin Kwon <[email protected]>

xinrong-meng added 2 commits March 9, 2023 16:54

init

91a5556

test

ccf86e3

github-actions bot added BUILD CONNECT CORE PYTHON SQL labels Mar 9, 2023

xinrong-meng changed the title ~~[SPARK-42710][CONNECT][PYTHON] Implement DataFrame.mapInArrow~~ [SPARK-42726][CONNECT][PYTHON] Implement DataFrame.mapInArrow Mar 9, 2023

HyukjinKwon approved these changes Mar 9, 2023

View reviewed changes

HyukjinKwon closed this in f35c2cb Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow` #40350

[SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow` #40350

Uh oh!

xinrong-meng commented Mar 9, 2023 •

edited

Loading

Uh oh!

HyukjinKwon commented Mar 10, 2023

Uh oh!

xinrong-meng commented Mar 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-42726][CONNECT][PYTHON] Implement DataFrame.mapInArrow #40350

[SPARK-42726][CONNECT][PYTHON] Implement DataFrame.mapInArrow #40350

Uh oh!

Conversation

xinrong-meng commented Mar 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon commented Mar 10, 2023

Uh oh!

xinrong-meng commented Mar 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow` #40350

[SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow` #40350

xinrong-meng commented Mar 9, 2023 •

edited

Loading