Skip to content

Conversation

JFPerkins
Copy link
Contributor

@JFPerkins JFPerkins commented Jul 30, 2018

Resolves #207 (and #179). It turns out that Cache writes to CSV without an index, but from_csv by default assumes the first column is an index frame, so it was assuming a data column was the index. from_csv is deprecated anyways so I've replaced all instances of its use.

Added a test to actually confirm that get_ephys_features contains the data that matches read_csv. In MouseConnectivityCache.get_experiment_structure_unionizes I use index_col=0 for read_csv since the writer method for that one is DataFrame.to_csv with default indexing (which will add an index column).

Looks like a fair bit of this overlaps with PR #180 but that was the result of drilling down fixing all the tests.

@JFPerkins JFPerkins requested review from dyf and jknox13 July 30, 2018 23:59
@codecov-io
Copy link

codecov-io commented Jul 31, 2018

Codecov Report

Merging #208 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #208   +/-   ##
=======================================
  Coverage   44.57%   44.57%           
=======================================
  Files          92       92           
  Lines       11801    11801           
=======================================
  Hits         5260     5260           
  Misses       6541     6541
Impacted Files Coverage Δ
allensdk/core/mouse_connectivity_cache.py 92.44% <ø> (ø) ⬆️
allensdk/api/cache.py 80.7% <0%> (ø) ⬆️
allensdk/brain_observatory/locally_sparse_noise.py 31.43% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a8f579...deca125. Read the comment docs.

@JFPerkins
Copy link
Contributor Author

Looks like pytest just pushed an update to 3.7.0 that broke our test discovery so I've capped it for now.

Copy link
Contributor

@dyf dyf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes needed, assuming you checked the column order here.

post=filter_fn,
writer=lambda p, x : pd.DataFrame(x).to_csv(p),
reader=pd.DataFrame.from_csv)
reader=lambda x: pd.read_csv(x, index_col=0, parse_dates=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check to make sure that the file saved here puts the correct column first when written?

cc @NileGraddis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a test that covers it. I was going by the documentation for to_csv (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html) which indicates that by default it is writing an index label column.

In general I'm planning to add more tests like the one I added for get_ephys_features that actually round trip the files, as I noticed all the tests we have seem to mock out the file writing.

Copy link
Contributor

@jknox13 jknox13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, as long as column 0 is only wanted as index in cases where index=0 is explicitly passed to pd.read_csv()

@JFPerkins JFPerkins requested a review from NileGraddis October 17, 2018 21:52
@NileGraddis NileGraddis merged commit cf7dc1f into master Oct 17, 2018
@NileGraddis NileGraddis mentioned this pull request Oct 17, 2018
@NileGraddis NileGraddis deleted the 207 branch July 30, 2019 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

two celltypes features missing in 0.14.5

5 participants