[SPARK-40907][PS][SQL] `PandasMode` should copy keys before inserting into Map #38385

zhengruifeng · 2022-10-25T07:13:25Z

What changes were proposed in this pull request?

Make PandasMode copy keys before inserting into Map

Why are the changes needed?

correctness issue similar to #38383, make it a separate PR since it is dedicated for Pandas API

In [24]: def f(index, iterator): return ['3', '3', '3', '3', '4'] if index == 3 else ['0', '1', '2', '3', '4']

In [25]: rdd = sc.parallelize([1, ], 4).mapPartitionsWithIndex(f)

In [26]: df = spark.createDataFrame(rdd, schema='string')

In [27]: psdf = df.pandas_api()

In [28]: psdf.mode()
Out[28]: 
  value
0     4

In [29]: psdf._to_pandas().mode()
Out[29]: 
  value
0     3

Does this PR introduce any user-facing change?

No

How was this patch tested?

added UT

itholic

Not strong feeling about my nit comment, LGTM.

itholic · 2022-10-25T09:31:26Z

python/pyspark/pandas/tests/test_dataframe.py

+        rdd = self.spark.sparkContext.parallelize(
+            [
+                1,
+            ],
+            4,


nit: can we just is one or two line some thing like:

rdd = self.spark.sparkContext.parallelize([1], 4) .mapPartitionsWithIndex(f)

?? I suspect it's maybe adjusted by black script tho, 😂

it was just reformated by the script😅

HyukjinKwon · 2022-10-25T14:55:52Z

Merged to master.

…into Map ### What changes were proposed in this pull request? Make `PandasMode` copy keys before inserting into Map ### Why are the changes needed? correctness issue similar to apache#38383, make it a separate PR since it is dedicated for Pandas API ``` In [24]: def f(index, iterator): return ['3', '3', '3', '3', '4'] if index == 3 else ['0', '1', '2', '3', '4'] In [25]: rdd = sc.parallelize([1, ], 4).mapPartitionsWithIndex(f) In [26]: df = spark.createDataFrame(rdd, schema='string') In [27]: psdf = df.pandas_api() In [28]: psdf.mode() Out[28]: value 0 4 In [29]: psdf._to_pandas().mode() Out[29]: value 0 3 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT Closes apache#38385 from zhengruifeng/ps_mode_fix. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

fix

d6c4f4b

github-actions bot added CORE PANDAS API ON SPARK PYTHON SQL labels Oct 25, 2022

HyukjinKwon approved these changes Oct 25, 2022

View reviewed changes

itholic approved these changes Oct 25, 2022

View reviewed changes

HyukjinKwon closed this in e4d0412 Oct 25, 2022

zhengruifeng deleted the ps_mode_fix branch October 26, 2022 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-40907][PS][SQL] `PandasMode` should copy keys before inserting into Map #38385

[SPARK-40907][PS][SQL] `PandasMode` should copy keys before inserting into Map #38385

Uh oh!

zhengruifeng commented Oct 25, 2022 •

edited

Loading

Uh oh!

itholic left a comment

Uh oh!

itholic Oct 25, 2022

Uh oh!

zhengruifeng Oct 25, 2022

Uh oh!

HyukjinKwon commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-40907][PS][SQL] PandasMode should copy keys before inserting into Map #38385

[SPARK-40907][PS][SQL] PandasMode should copy keys before inserting into Map #38385

Uh oh!

Conversation

zhengruifeng commented Oct 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

itholic left a comment

Choose a reason for hiding this comment

Uh oh!

itholic Oct 25, 2022

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Oct 25, 2022

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-40907][PS][SQL] `PandasMode` should copy keys before inserting into Map #38385

[SPARK-40907][PS][SQL] `PandasMode` should copy keys before inserting into Map #38385

zhengruifeng commented Oct 25, 2022 •

edited

Loading