-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C and Pure parser accept invalid UTF-8 strings, the Java parser doesn't. #138
Comments
I think the other variants should also raise an exception. I already implemented this for Ruby 1.9 extension and pure variants , but 1.8 is a bit more difficult. |
I'm not particularly concerned about 1.8. The report we got was that JRuby errored in 1.9 mode and I believe the user's only concerned with 1.9 mode. Hopefully the upstream data can be fixed too, since obviously the "fix" of actually erroring out will now cause the "bug" for MRI too. |
If anything, I think it's the C and Pure parsers that should raise an exception. The problem is that this behavior has been there forever, so there's a high chance it would break some users, which lead to the annoying conclusion that it should be behind a configuration flag. |
It would not be possible for us to implement such a config flag because the parser we use is always strict about this. @flori said above they implemented the error for 1.9+ pure and ext, so is there really anything to do here 12 years later? |
As far as I can tell, it's still not raising today: >> JSON::Pure::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].encoding
=> #<Encoding:UTF-8>
>> JSON::Pure::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].valid_encoding?
=> false
>> JSON::Ext::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].encoding
=> #<Encoding:UTF-8>
>> JSON::Ext::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].valid_encoding?
=> false |
Perhaps that fix never made it into main then. |
I know, I know...if it's bad content it's bad content. But this represents a difference from MRI.
Here's the case, again a reduced version of one I got from @rkh:
So basically there's a bad byte in a UTF-8 string, and the MRI version walks right by it and allows it to come through to the resulting parsed json structure.
I have a totally broken patch for this:
Again, I'm not sure this is actually something that needs to be fixed, but because the MRI version of json does not blow up on this content, there's something to be addressed.
The text was updated successfully, but these errors were encountered: