Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Jan 3, 2024

What changes were proposed in this pull request?

This PR proposes to fix Series.astype to work properly with missing value.

Why are the changes needed?

To follow the behavior of latest Pandas.

Does this PR introduce any user-facing change?

Yes, the bug is fixed to follow the behavior of Pandas:

Before

>>> psser = ps.Series([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(np.nan)])
>>> psser.astype(bool)
0    True
1    True
2    False
dtype: bool

After

>>> psser = ps.Series([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(np.nan)])
>>> psser.astype(bool)
0    True
1    True
2    True
dtype: bool

How was this patch tested?

Enable the existing UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

lambda: psser.astype(int_type),
)

# TODO(SPARK-37039): the np.nan series.astype(bool) should be True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add this into migration guide though.

dongjoon-hyun
dongjoon-hyun previously approved these changes Jan 3, 2024
@dongjoon-hyun
Copy link
Member

Oh, it seems the the failures are relevant, @itholic .

pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_SERIES] Series are not almost equal:
Left:
bool
Right:
bool

@HyukjinKwon
Copy link
Member

@itholic can you retrigger and/or fix the tests?

@itholic
Copy link
Contributor Author

itholic commented Jan 4, 2024

Oh.. seems like we should separately handle the boolean data type. I got some personal errands right now, so let me take a look tomorrow. Thanks!

@itholic itholic changed the title [SPARK-37039][PS] Fix Series.astype to work properly with missing value [WIP][SPARK-37039][PS] Fix Series.astype to work properly with missing value Jan 5, 2024
@itholic itholic marked this pull request as draft January 5, 2024 00:32
@itholic itholic marked this pull request as ready for review January 9, 2024 08:12
@itholic
Copy link
Contributor Author

itholic commented Jan 9, 2024

CI passed. @dongjoon-hyun @HyukjinKwon FYI

@itholic itholic changed the title [WIP][SPARK-37039][PS] Fix Series.astype to work properly with missing value [SPARK-37039][PS] Fix Series.astype to work properly with missing value Jan 9, 2024
@HyukjinKwon
Copy link
Member

Merged to master.

yaooqinn pushed a commit that referenced this pull request Jan 11, 2024
### What changes were proposed in this pull request?

This PR followup for #44570 to add migration guide for behavior change.

### Why are the changes needed?

We should notice user about any behavior change

### Does this PR introduce _any_ user-facing change?

No API change.

### How was this patch tested?

The existing CI should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44684 from itholic/SPARK-37039-foolowup.

Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants