Fixes #179 #180

jknox13 · 2018-06-28T16:57:30Z

Looks to be a plug and play change between pd.DataFrame.from_csv and
pd.read_csv since I don't see where anything more than the path
argument has been supplied to the function call.

Looks to be a plug and play change between `pd.DataFrame.from_csv` and `pd.read_csv` since I don't see where anything more than the path argument has been supplied to the function call.

codecov-io · 2018-06-28T17:23:27Z

Codecov Report

Merging #180 into master will not change coverage.
The diff coverage is 0%.

@@           Coverage Diff           @@
##           master     #180   +/-   ##
=======================================
  Coverage   43.93%   43.93%           
=======================================
  Files          91       91           
  Lines       11666    11666           
=======================================
  Hits         5125     5125           
  Misses       6541     6541

Impacted Files	Coverage Δ
allensdk/core/mouse_connectivity_cache.py	`92.94% <ø> (ø)`	⬆️
allensdk/api/cache.py	`80.45% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2d0bac...6744bd4. Read the comment docs.

jknox13 · 2018-06-28T17:27:17Z

It might be worth adding a threshold argument to codecov.yml to prevent a 0% change in coverage to fail the build (e.g. scikit-learn's)

dyf · 2018-07-02T16:47:00Z

@jknox13 I have read that pd.read_csv and pd.from_csv have different defaults related to the use of the index column: https://stackoverflow.com/a/26495839.

I'm a little concerned that blanket replacement will change the format of the files we're caching. Would you mind checking that some of the methods you changed don't adversely affect anything? For example: CellTypesCache.get_ephys_features uses cache_csv_json.

jknox13 · 2018-07-02T17:46:17Z

Good catch! Here is the full difference between the default kwargs for the two:

kwarg	pd.DataFrame.from_csv()	pd.read_csv
header	0	'infer'
sep	','	','
index_col	0	None
parse_dates	True	False
infer_datetime_format	False	False

The docs for pd.read_csv state that the default value for header (infer) is identical to header=0 if no names are passed. So truly the difference between default behavior is handling the index and parsing dates.

If a fully compatible calls to pd.read_csv are wanted, we could include the utility function

def compatible_read_csv(path, **kwargs):
    if 'index_col' not in kwargs:
        kwargs['index_col'] = 0
    if 'parse_dates' not in kwargs:
        kwargs['parse_dates'] = True

    return pd.read_csv(path, **kwargs)

A brief testing:

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
import pandas as pd


testcsv_index = """
0,A
1,B
2,C
3,D
4,E
"""

testcsv_noindex = """
A,apple
B,banana
C,carrot
D,dog
E,elephant
"""

testcsv_offset = """
2,C
3,D
4,E
5,F
6,G
"""

testcsv_alpha = """
A,0
B,1
C,2
D,3
E,4
"""

testcsv_header = """
letter, fuit-animal
A,apple
B,banana
C,carrot
D,dog
E,elephant
"""

testcsv_allnum = """
0.5, 0.2
0.4, 0.3
0.1, 0.2
0.4, 0.0
0.9, 0.3
"""
tests = (testcsv_index, testcsv_noindex, testcsv_offset, testcsv_alpha,
         testcsv_header, testcsv_allnum,)

def compatible_read_csv(path, **kwargs):
    if 'index_col' not in kwargs:
        kwargs['index_col'] = 0
    if 'parse_dates' not in kwargs:
        kwargs['parse_dates'] = True

    return pd.read_csv(path, **kwargs)

def test_read_from_csv(s):
    _from = pd.DataFrame.from_csv(StringIO(s))
    _read = compatible_read_csv(StringIO(s))

    pd.testing.assert_frame_equal(_from, _read)


if __name__ == '__main__':
    for s in tests:
        test_read_from_csv(s)

jknox13 · 2018-07-02T17:53:55Z

@dyf What do you think: add a utility function read_csv in the sdk similar to allensdk.core.json_utilities.read(), or supply index=0 to each call of pd.read_csv and parse_dates=True where needed?

dyf · 2018-07-25T21:34:05Z

I think I prefer the latter. Do you have time to make a pass through @jknox13 ?

NileGraddis · 2018-10-17T21:56:57Z

Closing as #208 covered the same territory. Thanks @jknox13 !

Fixes AllenInstitute#179

6744bd4

Looks to be a plug and play change between `pd.DataFrame.from_csv` and `pd.read_csv` since I don't see where anything more than the path argument has been supplied to the function call.

JFPerkins mentioned this pull request Jul 31, 2018

Fix cache roundtrips truncating dataframes #208

Merged

NileGraddis closed this Oct 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes #179 #180

Fixes #179 #180

Uh oh!

jknox13 commented Jun 28, 2018

Uh oh!

codecov-io commented Jun 28, 2018 •

edited

Loading

Uh oh!

jknox13 commented Jun 28, 2018

Uh oh!

dyf commented Jul 2, 2018 •

edited

Loading

Uh oh!

jknox13 commented Jul 2, 2018 •

edited

Loading

Uh oh!

jknox13 commented Jul 2, 2018 •

edited

Loading

Uh oh!

dyf commented Jul 25, 2018

Uh oh!

NileGraddis commented Oct 17, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixes #179 #180

Fixes #179 #180

Uh oh!

Conversation

jknox13 commented Jun 28, 2018

Uh oh!

codecov-io commented Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jknox13 commented Jun 28, 2018

Uh oh!

dyf commented Jul 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jknox13 commented Jul 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jknox13 commented Jul 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dyf commented Jul 25, 2018

Uh oh!

NileGraddis commented Oct 17, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-io commented Jun 28, 2018 •

edited

Loading

dyf commented Jul 2, 2018 •

edited

Loading

jknox13 commented Jul 2, 2018 •

edited

Loading

jknox13 commented Jul 2, 2018 •

edited

Loading