BUG: df.replace with numeric values and str to_replace #36093

jbrockmendel · 2020-09-03T15:57:21Z

closes BUG: Replace raises TypeError if to_replace is Dict with numeric DataFrame and key of Dict is String #34789
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

We also avoid copies by not calling self.as_array and instead moving the mask-finding to the block level.

…f-replace_list-copies

WillAyd · 2020-09-03T16:24:41Z

pandas/core/internals/blocks.py

+    if isinstance(mask, BooleanArray):
+        mask = mask.to_numpy(dtype=bool, na_value=False)
+    elif isinstance(mask, ExtensionArray):
        # We could have BooleanArray, Sparse[bool], ...


I think need to update this comment now though - so is there no way to keep this in the same branch as the ExtensionArray check? Would be nice to stay as generic as possible

I'll see if we can use to_numpy in the general case

WillAyd · 2020-09-03T16:26:12Z

pandas/core/array_algos/replace.py

+        return np.zeros(a.shape, dtype=bool)
+
+    elif is_datetimelike_v_numeric(a, b) or is_numeric_v_string_like(a, b):
+        # GH#29553 avoid deprecation warnings from numpy


Related to this PR?

Yes, though there is also a mistake here (the second condition has been refactored to a few lines up, so this line should just be elif is_datetimelike_v_numeric(a, b):

In master this is where we incorrectly raise instead of just consider string==numeric not-equal

…f-replace_list-copies

jbrockmendel · 2020-09-03T21:13:06Z

Looks like we have both a doctest and prose in missing_data.rst saying the current behavior (which this PR calls a bug) is intentional:

        Note that when replacing multiple ``bool`` or ``datetime64`` objects,
        the data types in the `to_replace` parameter must match the data
        type of the value being replaced:

        >>> df = pd.DataFrame({'A': [True, False, True], 'B': [False, True, False]})
        >>> df.replace({'a string': 'new value', True: False})  # raises
        Traceback (most recent call last):
            ...
        TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

        This raises a ``TypeError`` because one of the ``dict`` keys is not of
        the correct type for replacement.

Under this PR, the example in the doctest returns

       A      B
0  False  False
1  False  False
2  False  False

which strike me as better behavior.

TomAugspurger · 2020-09-04T13:59:59Z

I don't think the documented behavior is desirable, and I read it as more of a "hey this is a limitation of the replace implementation".

So I'm OK with changing behavior here as a "bugfix with behavior changing implications".

jreback · 2020-09-05T03:22:12Z

very nice @jbrockmendel +1 on adding array_algos and using them from the blocks

)

jbrockmendel added 2 commits September 3, 2020 08:53

BUG: df.replace with numeric values and string to_replace

8c8b36f

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

e44ff49

…f-replace_list-copies

WillAyd reviewed Sep 3, 2020

View reviewed changes

WillAyd added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Sep 3, 2020

jbrockmendel added 3 commits September 3, 2020 10:19

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

b3e3ad1

…f-replace_list-copies

missing import

839b757

avoid specific BooleanArray special casing

b683707

update docs

30be372

jreback added this to the 1.2 milestone Sep 5, 2020

jreback merged commit 3967131 into pandas-dev:master Sep 5, 2020

jbrockmendel deleted the ref-replace_list-copies branch September 5, 2020 03:41

simonjayhawkins mentioned this pull request Sep 5, 2020

DataFrame.replace: TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode' #16784

Closed

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Sep 8, 2020

BUG: df.replace with numeric values and str to_replace (pandas-dev#36093

97ed706

)

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

BUG: df.replace with numeric values and str to_replace (pandas-dev#36093

0dba256

)

simonjayhawkins mentioned this pull request Dec 14, 2020

BUG: Inconsistent behavior of .replace() in Int64 series with <NA>. #38267

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: df.replace with numeric values and str to_replace #36093

BUG: df.replace with numeric values and str to_replace #36093

Uh oh!

jbrockmendel commented Sep 3, 2020

Uh oh!

WillAyd Sep 3, 2020

Uh oh!

jbrockmendel Sep 3, 2020

Uh oh!

WillAyd Sep 3, 2020

Uh oh!

jbrockmendel Sep 3, 2020

Uh oh!

jbrockmendel commented Sep 3, 2020

Uh oh!

TomAugspurger commented Sep 4, 2020 •

edited

Loading

Uh oh!

jreback commented Sep 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

BUG: df.replace with numeric values and str to_replace #36093

BUG: df.replace with numeric values and str to_replace #36093

Uh oh!

Conversation

jbrockmendel commented Sep 3, 2020

Uh oh!

WillAyd Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

WillAyd Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Sep 3, 2020

Uh oh!

TomAugspurger commented Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Sep 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TomAugspurger commented Sep 4, 2020 •

edited

Loading