-
Notifications
You must be signed in to change notification settings - Fork 157
Fixes #179 #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #179 #180
Conversation
Looks to be a plug and play change between `pd.DataFrame.from_csv` and `pd.read_csv` since I don't see where anything more than the path argument has been supplied to the function call.
Codecov Report
@@ Coverage Diff @@
## master #180 +/- ##
=======================================
Coverage 43.93% 43.93%
=======================================
Files 91 91
Lines 11666 11666
=======================================
Hits 5125 5125
Misses 6541 6541
Continue to review full report at Codecov.
|
It might be worth adding a threshold argument to codecov.yml to prevent a 0% change in coverage to fail the build (e.g. scikit-learn's) |
@jknox13 I have read that I'm a little concerned that blanket replacement will change the format of the files we're caching. Would you mind checking that some of the methods you changed don't adversely affect anything? For example: CellTypesCache.get_ephys_features uses |
Good catch! Here is the full difference between the default kwargs for the two:
The docs for If a fully compatible calls to def compatible_read_csv(path, **kwargs):
if 'index_col' not in kwargs:
kwargs['index_col'] = 0
if 'parse_dates' not in kwargs:
kwargs['parse_dates'] = True
return pd.read_csv(path, **kwargs) A brief testing: try:
from StringIO import StringIO
except ImportError:
from io import StringIO
import pandas as pd
testcsv_index = """
0,A
1,B
2,C
3,D
4,E
"""
testcsv_noindex = """
A,apple
B,banana
C,carrot
D,dog
E,elephant
"""
testcsv_offset = """
2,C
3,D
4,E
5,F
6,G
"""
testcsv_alpha = """
A,0
B,1
C,2
D,3
E,4
"""
testcsv_header = """
letter, fuit-animal
A,apple
B,banana
C,carrot
D,dog
E,elephant
"""
testcsv_allnum = """
0.5, 0.2
0.4, 0.3
0.1, 0.2
0.4, 0.0
0.9, 0.3
"""
tests = (testcsv_index, testcsv_noindex, testcsv_offset, testcsv_alpha,
testcsv_header, testcsv_allnum,)
def compatible_read_csv(path, **kwargs):
if 'index_col' not in kwargs:
kwargs['index_col'] = 0
if 'parse_dates' not in kwargs:
kwargs['parse_dates'] = True
return pd.read_csv(path, **kwargs)
def test_read_from_csv(s):
_from = pd.DataFrame.from_csv(StringIO(s))
_read = compatible_read_csv(StringIO(s))
pd.testing.assert_frame_equal(_from, _read)
if __name__ == '__main__':
for s in tests:
test_read_from_csv(s) |
@dyf What do you think: add a utility function |
I think I prefer the latter. Do you have time to make a pass through @jknox13 ? |
Looks to be a plug and play change between
pd.DataFrame.from_csv
andpd.read_csv
since I don't see where anything more than the pathargument has been supplied to the function call.