Skip to content

Conversation

@jbrockmendel
Copy link
Member

We also avoid copies by not calling self.as_array and instead moving the mask-finding to the block level.

if isinstance(mask, BooleanArray):
mask = mask.to_numpy(dtype=bool, na_value=False)
elif isinstance(mask, ExtensionArray):
# We could have BooleanArray, Sparse[bool], ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need to update this comment now though - so is there no way to keep this in the same branch as the ExtensionArray check? Would be nice to stay as generic as possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if we can use to_numpy in the general case

return np.zeros(a.shape, dtype=bool)

elif is_datetimelike_v_numeric(a, b) or is_numeric_v_string_like(a, b):
# GH#29553 avoid deprecation warnings from numpy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though there is also a mistake here (the second condition has been refactored to a few lines up, so this line should just be elif is_datetimelike_v_numeric(a, b):

In master this is where we incorrectly raise instead of just consider string==numeric not-equal

@WillAyd WillAyd added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Sep 3, 2020
@jbrockmendel
Copy link
Member Author

Looks like we have both a doctest and prose in missing_data.rst saying the current behavior (which this PR calls a bug) is intentional:

        Note that when replacing multiple ``bool`` or ``datetime64`` objects,
        the data types in the `to_replace` parameter must match the data
        type of the value being replaced:

        >>> df = pd.DataFrame({'A': [True, False, True], 'B': [False, True, False]})
        >>> df.replace({'a string': 'new value', True: False})  # raises
        Traceback (most recent call last):
            ...
        TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

        This raises a ``TypeError`` because one of the ``dict`` keys is not of
        the correct type for replacement.

Under this PR, the example in the doctest returns

       A      B
0  False  False
1  False  False
2  False  False

which strike me as better behavior.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 4, 2020

I don't think the documented behavior is desirable, and I read it as more of a "hey this is a limitation of the replace implementation".

So I'm OK with changing behavior here as a "bugfix with behavior changing implications".

@jreback jreback added this to the 1.2 milestone Sep 5, 2020
@jreback jreback merged commit 3967131 into pandas-dev:master Sep 5, 2020
@jreback
Copy link
Contributor

jreback commented Sep 5, 2020

very nice @jbrockmendel +1 on adding array_algos and using them from the blocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Replace raises TypeError if to_replace is Dict with numeric DataFrame and key of Dict is String

4 participants