Skip to content

Conversation

@dwhswenson
Copy link
Owner

@dwhswenson dwhswenson commented Feb 8, 2020

This gets the tests running under pandas 1.0, but they fail with an assertion error. I think (still need to confirm) that this is the same as pandas-dev/pandas#29814. If so, then it is a bug in pandas.

Maybe release 0.5.0 with a pin to pandas<1? It doesn't look like it is easy to generate sparse matrices directly from dict-of-keys in pandas; pandas seems to prefer to take a dense matrix and make it sparse. We don't want to create the dense matrix in the first place.

For now, there seems to be a bug in pandas; so we may need to
wait on that. I think (still need to confirm) that this is the
source of the test failures.
@sroet
Copy link
Collaborator

sroet commented Feb 10, 2020

I probably will not have time to look at this in detail this week.
However, a quick google did give me a way of migrating with Scipy Sparse matrices

@dwhswenson
Copy link
Owner Author

@sroet : That's the problem. We currently go from scipy to pandas. I've gone through the steps in that guide (that's why the code runs now with an error in correctness, not an error in running). The bug in pandas-dev/pandas#29814 is that pandas isn't reading correctly from scipy.sparse (or possibly, just that pandas.sparse has a fundamental error in it). In either case, I think it's a problem that should be fixed on the pandas side; I'm also not currently at a point where I have time to make a PR to fix it there.

Still haven't verified that this is definitely the same problem as that, but it seems pretty likely, since it is an unresolved pandas bug with making sparse DFs from scipy matrices.

Main question then is, do we release 0.5.0 pinned to an older version of pandas? (i.e., without this PR). Since it seems like we're both pretty busy, I lean toward pinning pandas and releasing 0.5.0 to avoid more users having problems like #67.

@sroet
Copy link
Collaborator

sroet commented Feb 10, 2020

Main question then is, do we release 0.5.0 pinned to an older version of pandas? (i.e., without this PR). Since it seems like we're both pretty busy, I lean toward pinning pandas and releasing 0.5.0 to avoid more users having problems like #67.

+1 on this

@dwhswenson dwhswenson mentioned this pull request Feb 10, 2020
@dwhswenson
Copy link
Owner Author

I've got a monkey patch working for this, but I want to see if it can get fixed quickly in pandas instead.

In any case, supporting py27 is now definitely too cumbersome. We'll release this as a 0.5.1 (hopefully without needing to monkeypatch pandas), but then let's move master to 0.6.0.dev0, dropping py27. I also vote for dropping win_32 testing from AppVeyor, instead using highest/lowest Python versions (giving 2 tests instead of 4; free AppVeyor doesn't run in parallel). Our AppVeyor testing is based on MDTraj, and probably MDTraj of 7 years ago.

@sroet
Copy link
Collaborator

sroet commented Feb 14, 2020

LGTM, feel free to merge at your discretion (if pandas fix takes to long)

In any case, supporting py27 is now definitely too cumbersome. We'll release this as a 0.5.1 (hopefully without needing to monkeypatch pandas), but then let's move master to 0.6.0.dev0, dropping py27. I also vote for dropping win_32 testing from AppVeyor, instead using highest/lowest Python versions (giving 2 tests instead of 4; free AppVeyor doesn't run in parallel).

+1 for dropping py27 and win_32

@dwhswenson
Copy link
Owner Author

It looks like this problem may have been resolved by pandas-dev/pandas#32825 (although I can't seem to get a working dev build of pandas going to verify it.) That's a complete rewrite of DataFrame.sparse.from_spmatrix that bypasses the problematic code in SparseArray.from_spmatrix. I think I'll merge this for now, and sometime after pandas 1.1 is released (with that fix) we can require pandas>=1.1 and remove the patch. The good thing is that the patch won't even be called once pandas 1.1 is out, since the buggy code is bypassed in the new implementation.

I will also release 0.5.1, which will mean I stop getting emails about our stable branch failing on Travis because MDTraj-dev no longer supports Py2 (see #75).

@dwhswenson dwhswenson changed the title [WIP] Fixes for sparse matrix in Pandas 1.0 Fixes for sparse matrix in Pandas 1.0 Jul 7, 2020
@dwhswenson dwhswenson added misc PR upstream_changed PR in response to changes in dependencies labels Jul 7, 2020
@dwhswenson dwhswenson merged commit fad2450 into master Jul 7, 2020
@dwhswenson dwhswenson deleted the pandas_sparse branch July 7, 2020 13:15
@dwhswenson dwhswenson mentioned this pull request Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

misc PR upstream_changed PR in response to changes in dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants