Fix cache roundtrips truncating dataframes #208

JFPerkins · 2018-07-30T23:58:51Z

Resolves #207 (and #179). It turns out that Cache writes to CSV without an index, but from_csv by default assumes the first column is an index frame, so it was assuming a data column was the index. from_csv is deprecated anyways so I've replaced all instances of its use.

Added a test to actually confirm that get_ephys_features contains the data that matches read_csv. In MouseConnectivityCache.get_experiment_structure_unionizes I use index_col=0 for read_csv since the writer method for that one is DataFrame.to_csv with default indexing (which will add an index column).

Looks like a fair bit of this overlaps with PR #180 but that was the result of drilling down fixing all the tests.

codecov-io · 2018-07-31T00:14:27Z

Codecov Report

Merging #208 into master will not change coverage.
The diff coverage is 0%.

@@           Coverage Diff           @@
##           master     #208   +/-   ##
=======================================
  Coverage   44.57%   44.57%           
=======================================
  Files          92       92           
  Lines       11801    11801           
=======================================
  Hits         5260     5260           
  Misses       6541     6541

Impacted Files	Coverage Δ
allensdk/core/mouse_connectivity_cache.py	`92.44% <ø> (ø)`	⬆️
allensdk/api/cache.py	`80.7% <0%> (ø)`	⬆️
allensdk/brain_observatory/locally_sparse_noise.py	`31.43% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a8f579...deca125. Read the comment docs.

… fails in >= 3.7.0

JFPerkins · 2018-07-31T01:27:16Z

Looks like pytest just pushed an update to 3.7.0 that broke our test discovery so I've capped it for now.

dyf

No changes needed, assuming you checked the column order here.

dyf · 2018-07-31T02:50:34Z

allensdk/core/mouse_connectivity_cache.py

                                                post=filter_fn,
                                                writer=lambda p, x : pd.DataFrame(x).to_csv(p),
-                                                reader=pd.DataFrame.from_csv)
+                                                reader=lambda x: pd.read_csv(x, index_col=0, parse_dates=True))


Did you check to make sure that the file saved here puts the correct column first when written?

cc @NileGraddis

I'll add a test that covers it. I was going by the documentation for to_csv (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html) which indicates that by default it is writing an index label column.

In general I'm planning to add more tests like the one I added for get_ephys_features that actually round trip the files, as I noticed all the tests we have seem to mock out the file writing.

jknox13

LGTM, as long as column 0 is only wanted as index in cases where index=0 is explicitly passed to pd.read_csv()

JFPerkins added 4 commits July 30, 2018 16:01

Replace deprecated from_csv calls with read_csv

a9b3d64

Remove unused from_csv references in tests

f0fba88

Add test to cover actual use of cache that prompted #207

05f7ac8

Remove reference to from_csv from documentation

23b8dc6

JFPerkins requested review from dyf and jknox13 July 30, 2018 23:59

restrict pytest version pending investigation into why test discovery…

8ad1afb

… fails in >= 3.7.0

dyf approved these changes Jul 31, 2018

View reviewed changes

jknox13 approved these changes Jul 31, 2018

View reviewed changes

JFPerkins added 4 commits October 12, 2018 13:34

Update 207 from master

de20bed

Don't set max version for pytest

60ed302

Mouse connectivity round tripping cacheing test

f54438e

Fix syntax error in example

deca125

JFPerkins requested a review from NileGraddis October 17, 2018 21:52

NileGraddis approved these changes Oct 17, 2018

View reviewed changes

NileGraddis merged commit cf7dc1f into master Oct 17, 2018

NileGraddis mentioned this pull request Oct 17, 2018

Fixes #179 #180

Closed

JFPerkins mentioned this pull request Oct 17, 2018

pandas: from_csv is deprecated #179

Closed

NileGraddis deleted the 207 branch July 30, 2019 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix cache roundtrips truncating dataframes #208

Fix cache roundtrips truncating dataframes #208

Uh oh!

JFPerkins commented Jul 30, 2018 •

edited

Loading

Uh oh!

codecov-io commented Jul 31, 2018 •

edited

Loading

Uh oh!

JFPerkins commented Jul 31, 2018

Uh oh!

dyf left a comment

Uh oh!

dyf Jul 31, 2018

Uh oh!

JFPerkins Jul 31, 2018

Uh oh!

jknox13 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix cache roundtrips truncating dataframes #208

Fix cache roundtrips truncating dataframes #208

Uh oh!

Conversation

JFPerkins commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Jul 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JFPerkins commented Jul 31, 2018

Uh oh!

dyf left a comment

Choose a reason for hiding this comment

Uh oh!

dyf Jul 31, 2018

Choose a reason for hiding this comment

Uh oh!

JFPerkins Jul 31, 2018

Choose a reason for hiding this comment

Uh oh!

jknox13 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JFPerkins commented Jul 30, 2018 •

edited

Loading

codecov-io commented Jul 31, 2018 •

edited

Loading