Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6556: [Python] Handle future removal of pandas SparseDataFrame #5377

Conversation

jorisvandenbossche
Copy link
Member

https://issues.apache.org/jira/browse/ARROW-6556

The plan in pandas is to remove SparseDataFrame/Series in pandas 1.0. By making sure our code works with that, we can ensure that a pandas release does not break the pyarrow release that is at that moment the latest stable release.
(and this also makes it easier for me to develop on master branches of both together)

This isn't merged yet in pandas, so we can also wait until that is done to merge this PR.
I was just trying out some things in pandas and saw that the pyarrow feather tests were failing when we remove those classes.

@codecov-io
Copy link

Codecov Report

Merging #5377 into master will decrease coverage by 3.79%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5377      +/-   ##
==========================================
- Coverage   69.35%   65.56%    -3.8%     
==========================================
  Files         745      497     -248     
  Lines       87819    68976   -18843     
  Branches     1437        0    -1437     
==========================================
- Hits        60905    45222   -15683     
+ Misses      26552    23754    -2798     
+ Partials      362        0     -362
Impacted Files Coverage Δ
python/pyarrow/serialization.py 84.76% <100%> (ø) ⬆️
python/pyarrow/feather.py 89.01% <100%> (ø) ⬆️
python/pyarrow/pandas-shim.pxi 64.15% <60%> (-0.21%) ⬇️
python/pyarrow/tests/test_feather.py 95.9% <75%> (-0.26%) ⬇️
cpp/src/plasma/thirdparty/ae/ae.c 70.75% <0%> (-1.89%) ⬇️
cpp/src/plasma/store.cc 78.97% <0%> (-0.33%) ⬇️
cpp/src/arrow/compare.cc 53.47% <0%> (-0.15%) ⬇️
python/pyarrow/tests/test_parquet.py 96.09% <0%> (-0.06%) ⬇️
r/src/recordbatch.cpp
go/arrow/math/uint64_amd64.go
... and 246 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bc0fda...705f0a7. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Thanks Joris. I was just going to file a JIRA :)

Does this change your opinion on removing SparseDataFrame in 1.0? The check in write in feather.py means you won't be able to write a feather file without a pyarrow with this patch (0.15?)

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wesm
Copy link
Member

wesm commented Sep 14, 2019

@TomAugspurger I don't think we should be making accommodations for users to stay on old versions of pyarrow at this point

@wesm wesm closed this in 3e6f8d1 Sep 14, 2019
pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request Sep 16, 2019
https://issues.apache.org/jira/browse/ARROW-6556

The plan in pandas is to remove SparseDataFrame/Series in pandas 1.0. By making sure our code works with that, we can ensure that a pandas release does not break the pyarrow release that is at that moment the latest stable release.
(and this also makes it easier for me to develop on master branches of both together)

This isn't merged yet in pandas, so we can also wait until that is done to merge this PR.
I was just trying out some things in pandas and saw that the pyarrow feather tests were failing when we remove those classes.

Closes apache#5377 from jorisvandenbossche/ARROW-6556-pandas-sparse and squashes the following commits:

705f0a7 <Joris Van den Bossche> ARROW-6556:  Handle future removal of pandas SparseDataFrame

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>
@jorisvandenbossche jorisvandenbossche deleted the ARROW-6556-pandas-sparse branch September 19, 2019 10:10
wesm pushed a commit that referenced this pull request Sep 19, 2019
Follow-up on https://issues.apache.org/jira/browse/ARROW-6556 / #5377.
We were a bit fast with merging the other PR, as pandas changed the removal slightly (it added back a dummy SparseDataFrame object which triggers a warning upon access, but this means that `hasattr(pd, 'SparseDataFrame)` or try/except on import is not a good check to know if the actual class still exists or not).

Using pytests warnings filter I now ensure no such warning is raised in our test suite.

Closes #5438 from jorisvandenbossche/ARROW-6556-fix-warning and squashes the following commits:

06a0738 <Joris Van den Bossche> ARROW-6556:  Fix warning for pandas SparseDataFrame removal

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants