-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: Respect usecols even with empty data #12506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tests are passing. Should be good to merge. |
pandas/io/tests/test_parsers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEVER use numpy.testing imports!!!!!
ALWAYS use pandas testing functions. These are much more comprehensive and fully test all behavior. It is also confusing to a reader.
Fully test vs an expected frame and use assert_frame_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it with the first exclamation point. ;) - fixed.
e58ddd0 to
f25f3c6
Compare
|
If some could cancel this build #18023 (it's an old build), that would be great. Thanks! |
4fb3b0e to
84d7cb3
Compare
|
@jreback : Tests are passing once more. Should be good to merge AFAICT. |
pandas/io/parsers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you adding all of this code and making this very complicated?
the idea is to make things simpler. pls see if you can find a general soln.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How complicated is this code? It's essentially boilerplate and logic from the C engine code. I'm not writing anything new here. I don't understand why people would have a hard time understanding this code but yet the C engine code is considered comprehensible.
|
@gfyoung you had to change a whole bunch of things to get your code to work (e.g. changing long established behavior in excel/html parsing), the |
3304d2e to
5addc69
Compare
|
@jreback : Simplified things a little by changing the error from |
|
at a glance looks ok. still working on 0.18.0, so will get to next week. |
|
👍 will ping on all of them next week then. |
efc8523 to
c5bdde3
Compare
45d5876 to
e4bc4ec
Compare
|
@jorisvandenbossche : Any update on this PR? I think @jreback is waiting for your feedback before he merges this in. |
|
Sorry, slowly getting back on track :-) I will take a look now. Just a quick question, my previous comment was about HTML and Excel parser. The whatsnew note now does not mention those anymore. There are no longer changes to to |
|
@jorisvandenbossche : No worries. 😄 There are in fact changes to |
pandas/io/tests/test_parsers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait...why did flake8 not catch that? No, I don't think so AFAICT.
|
@gfyoung But they only catch it under the hood? (I mean, it doesn't bubble up to the user?) |
doc/source/whatsnew/v0.18.1.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually shown with the full name? (just a question, didn't test it, didn't fetch the PR, but I would just show the same as in an actual console)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is what I did. 😄 - FYI you can observe this full out name thing if you trigger any current CParserError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, perfect! (I just wondered if it was the case)
|
@jorisvandenbossche : Not AFAICT (it's an explicit |
doc/source/whatsnew/v0.18.1.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sub-section should be in API changes.
add a direction above for referncding this: .._whatsnew.......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say Change inread_csvexceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@gfyoung ok just a couple of minor doc comments. Pls also have a look thru |
In Python, when reading an empty file, it used to throw a StopIteration error with no error message. This PR helps to differentiate the case when no columns are inferable, which now leads to an EmptyDataError for both the C and Python engines. [ci skip]
|
|
||
| In addition to this error change, several others have been made as well: | ||
|
|
||
| - ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this whatsnew just not put in before? (the PR was already merged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I moved it into this section because it's related
|
@gfyoung thanks! |
closes #12493
in which the
usecolsargument was not being respected for empty data. This is because no filtering was applied when the first (and only) chunk was being read.