-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to text parser to handle decode errors #3301 #3302
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3302 +/- ##
=======================================
Coverage 85.94% 85.95%
=======================================
Files 376 376
Lines 32359 32372 +13
=======================================
+ Hits 27812 27824 +12
- Misses 4547 4548 +1
Continue to review full report at Codecov.
|
self._current_offset + exception.start)) | ||
|
||
escaped = '\\x{0:2x}'.format(exception.object[exception.start]) | ||
return (escaped, exception.start + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be exception.end (no +1) instead? I assume start and end will be the same after a 1-byte error, but if there's multiple bytes of invalid data, wouldn't we want to resume parsing at the end of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I took the conservative approach to account for single byte data that might case multi byte decoding errors
exception.object[exception.start], | ||
self._current_offset + exception.start)) | ||
|
||
escaped = '\\x{0:2x}'.format(exception.object[exception.start]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only handles the case of a 1-byte error, are you sure you want 1 exception message per invalid byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the conservative approach to account for single byte data that might case multi byte decoding errors. We could limit the number of warning that will be generated if you expect a lot of encoding errors.
@Onager PTAL |
Changes to text parser to handle decode errors #3301