Skip to content

Conversation

@ryankarlos
Copy link
Contributor

@ryankarlos ryankarlos commented Jan 2, 2020

@ryankarlos ryankarlos changed the title deprecate is_copy argument of take and remove is_copy=False usage DEPR: dis_copy argument of take Jan 2, 2020
@ryankarlos ryankarlos changed the title DEPR: dis_copy argument of take DEPR: is_copy arg of take Jan 2, 2020
@ryankarlos ryankarlos force-pushed the DEPR/is_copy_arg_of_take branch from 6f4c0fc to 4368cd3 Compare January 2, 2020 05:06
@ryankarlos
Copy link
Contributor Author

Not sure why my changes are making these other tests break ?
Failed: DID NOT RAISE <class 'pandas.core.common.SettingWithCopyError'>

@jreback jreback added the Deprecate Functionality to remove in pandas label Jan 3, 2020
@jreback
Copy link
Contributor

jreback commented Jan 3, 2020

looks good, needs a rebase; ping when green.

@pep8speaks
Copy link

pep8speaks commented Jan 3, 2020

Hello @ryankarlos! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-06 02:08:13 UTC

@ryankarlos ryankarlos force-pushed the DEPR/is_copy_arg_of_take branch 2 times, most recently from c9d5f26 to 4ed4028 Compare January 3, 2020 01:44
@jreback jreback added this to the 1.0 milestone Jan 3, 2020
@ryankarlos ryankarlos force-pushed the DEPR/is_copy_arg_of_take branch from aea421f to 2fd97da Compare January 3, 2020 03:17
@ryankarlos
Copy link
Contributor Author

ryankarlos commented Jan 3, 2020

@jreback do you know why the depr warning is making the test fail specifically for Linuxpy37 ?

pandas/tests/frame/methods/test_asof.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/generic.py:7009: in asof
    data = self.take(locs, is_copy=False)

and if i set this to data = self.take(locs) , it raises a SettingWithCopyError for all the builds due to this write on copy data.loc[missing] = np.nan

        if value == "raise":
>           raise com.SettingWithCopyError(t)
E           pandas.core.common.SettingWithCopyError: 
E           A value is trying to be set on a copy of a slice from a DataFrame.
E           Try using .loc[row_indexer,col_indexer] = value instead
E           
E           See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

pandas/core/generic.py:3684: SettingWithCopyError

@TomAugspurger
Copy link
Contributor

The numpydev py37 build is the only one where we elevate warnings to errors which fail the build.

Most likely you'll need to look at that test and what is_copy is doing. If the test looks correct, you made need to replicate some of the is_copy=False behavior in the places that previously passed is_copy=False.

@ryankarlos ryankarlos force-pushed the DEPR/is_copy_arg_of_take branch from 2fd97da to dcb8fd2 Compare January 4, 2020 05:17
@ryankarlos
Copy link
Contributor Author

ryankarlos commented Jan 4, 2020

thanks. @TomAugspurger @jreback found a workaround - all green now - let me know if any more changes needed.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated routines, df.asof should not be showing a warning

with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)

result = df.asof(dates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this showing a warning? you need to fix internally so this is not the case

Copy link
Contributor Author

@ryankarlos ryankarlos Jan 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the self.take(locs, is_copy=False) in def asof will now generate a warning after the deprecation change.
I could change it to default None (which sets is_copy=True) - but then that generates a SettingWithCopyError due to data.loc[missing] = np.nan in the block below

missing = locs == -1
data = self.take(locs)
data.index = where
data.loc[missing] = np.nan

I suppose I could temporarily suppress it

with pd.option_context('mode.chained_assignment', None):
            data.loc[missing] = np.nan

or create a deep copy after the take operation but not sure if thats the best approach ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you simply step theu the code and out a copy where needed

expected = df.asof(dates)
tm.assert_frame_equal(result, expected)
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@ryankarlos ryankarlos force-pushed the DEPR/is_copy_arg_of_take branch from dcb8fd2 to 6c1d495 Compare January 5, 2020 03:15
@jreback jreback merged commit 7796be6 into pandas-dev:master Jan 6, 2020
@jreback
Copy link
Contributor

jreback commented Jan 6, 2020

thanks @ryankarlos

@jorisvandenbossche
Copy link
Member

This might have caused a regression in asof: https://pandas.pydata.org/speed/pandas/#timeseries.AsOf.time_asof?p-constructor='DataFrame'&commits=2a3d840d-c99dfea3 (note there are a bunch of other commits in the time span of the regression according to asv, but this is the most obvious one that touched asof)

missing = locs == -1
data = self.take(locs, is_copy=False)
d = self.take(locs)
data = d.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there test failures if you didn't do this d.copy() step?

Copy link
Contributor Author

@ryankarlos ryankarlos Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was breaking builds with SettingwithCopyError see #30615 (comment) and lower bit of #30615 (comment)

@jorisvandenbossche
Copy link
Member

Yes, it was breaking builds with SettingwithCopyError #30615 (comment)

I think this points to a problem in the current deprecation.

IMO, the goal of removing is_copy is to make this keyword unnecessary, in the idea that take should always return a copy. So that means you should never need to manually copy after doing a take.

WIth the current deprecation however, the user seems to loose this ability of not needing to take a copy. So I think, when deprecating/removing the is_copy arg, it should rather be the behaviour of is_copy=False that we should keep as the default.

@jreback
Copy link
Contributor

jreback commented Jan 8, 2020

the purpose of is_copy was to force a copy
take never had the always copy behavior of numpy - but it was mostly an internal method anyhow (eg you can use iloc)

so prob ok just to leave as is (maybe add a doc string)

@ryankarlos ryankarlos deleted the DEPR/is_copy_arg_of_take branch January 8, 2020 03:24
@jorisvandenbossche
Copy link
Member

the purpose of is_copy was to force a copy

I don't think this is correct, as is_copy=True/False did not influence whether a copy was made or not. What is_copy=True did, is setting self._is_copy to indicate to pandas it is referencing another object. So actually is_copy=True made it "less" a copy than is_copy=False.

With is_copy=False, you could indicate to pandas "this is really a copy already, don't give me annoying SettingWithCopyWarnings" without the need to do another explicit copy.

(the name of is_copy and the explanation in the docstring is just very confusing / wrong)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Deprecate Functionality to remove in pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DEPR: .take(..., is_copy=True); rename is_copy -> copy

6 participants