-
Notifications
You must be signed in to change notification settings - Fork 157
Fix cache roundtrips truncating dataframes #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #208 +/- ##
=======================================
Coverage 44.57% 44.57%
=======================================
Files 92 92
Lines 11801 11801
=======================================
Hits 5260 5260
Misses 6541 6541
Continue to review full report at Codecov.
|
… fails in >= 3.7.0
Looks like pytest just pushed an update to 3.7.0 that broke our test discovery so I've capped it for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes needed, assuming you checked the column order here.
post=filter_fn, | ||
writer=lambda p, x : pd.DataFrame(x).to_csv(p), | ||
reader=pd.DataFrame.from_csv) | ||
reader=lambda x: pd.read_csv(x, index_col=0, parse_dates=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check to make sure that the file saved here puts the correct column first when written?
cc @NileGraddis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a test that covers it. I was going by the documentation for to_csv (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html) which indicates that by default it is writing an index label column.
In general I'm planning to add more tests like the one I added for get_ephys_features that actually round trip the files, as I noticed all the tests we have seem to mock out the file writing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, as long as column 0 is only wanted as index in cases where index=0
is explicitly passed to pd.read_csv()
Resolves #207 (and #179). It turns out that Cache writes to CSV without an index, but from_csv by default assumes the first column is an index frame, so it was assuming a data column was the index. from_csv is deprecated anyways so I've replaced all instances of its use.
Added a test to actually confirm that get_ephys_features contains the data that matches read_csv. In MouseConnectivityCache.get_experiment_structure_unionizes I use index_col=0 for read_csv since the writer method for that one is DataFrame.to_csv with default indexing (which will add an index column).
Looks like a fair bit of this overlaps with PR #180 but that was the result of drilling down fixing all the tests.