Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C and Pure parser accept invalid UTF-8 strings, the Java parser doesn't. #138

Open
headius opened this issue May 31, 2012 · 6 comments
Open
Labels
Milestone

Comments

@headius
Copy link
Contributor

headius commented May 31, 2012

I know, I know...if it's bad content it's bad content. But this represents a difference from MRI.

Here's the case, again a reduced version of one I got from @rkh:

# encoding: utf-8
require 'json'
x = "{\"foo\":\"\xC3\"}"
h = JSON.parse(x)
p h['foo']
p h['foo'].encoding

So basically there's a bad byte in a UTF-8 string, and the MRI version walks right by it and allows it to come through to the resulting parsed json structure.

I have a totally broken patch for this:

diff --git a/java/src/json/ext/ByteListTranscoder.java b/java/src/json/ext/ByteListTranscoder.java
index ed9e54b..a7e42ba 100644
--- a/java/src/json/ext/ByteListTranscoder.java
+++ b/java/src/json/ext/ByteListTranscoder.java
@@ -78,9 +78,10 @@ abstract class ByteListTranscoder {
             return head;
         }
         if (head <= 0xbf) { // 0b10xxxxxx
-            throw invalidUtf8(); // tail byte with no head
+            return head; //throw invalidUtf8(); // tail byte with no head
         }
         if (head <= 0xdf) { // 0b110xxxxx
+            if (pos + 1 > srcEnd) return head;
             ensureMin(1);
             int cp = ((head  & 0x1f) << 6)
                      | nextPart();

Again, I'm not sure this is actually something that needs to be fixed, but because the MRI version of json does not blow up on this content, there's something to be addressed.

@flori
Copy link
Member

flori commented Jun 7, 2012

I think the other variants should also raise an exception. I already implemented this for Ruby 1.9 extension and pure variants , but 1.8 is a bit more difficult.

@headius
Copy link
Contributor Author

headius commented Jun 8, 2012

I'm not particularly concerned about 1.8. The report we got was that JRuby errored in 1.9 mode and I believe the user's only concerned with 1.9 mode. Hopefully the upstream data can be fixed too, since obviously the "fix" of actually erroring out will now cause the "bug" for MRI too.

@byroot byroot added the bug label Oct 17, 2024
@byroot
Copy link
Member

byroot commented Oct 18, 2024

If anything, I think it's the C and Pure parsers that should raise an exception.

The problem is that this behavior has been there forever, so there's a high chance it would break some users, which lead to the annoying conclusion that it should be behind a configuration flag.

@byroot byroot changed the title Bad UTF-8 content blows up parser C and Pure parser accept invalid UTF-8 strings, the Java parser doesn't. Oct 18, 2024
@headius
Copy link
Contributor Author

headius commented Oct 18, 2024

It would not be possible for us to implement such a config flag because the parser we use is always strict about this. @flori said above they implemented the error for 1.9+ pure and ext, so is there really anything to do here 12 years later?

@byroot
Copy link
Member

byroot commented Oct 18, 2024

As far as I can tell, it's still not raising today:

>> JSON::Pure::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].encoding
=> #<Encoding:UTF-8>
>> JSON::Pure::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].valid_encoding?
=> false
>> JSON::Ext::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].encoding
=> #<Encoding:UTF-8>
>> JSON::Ext::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].valid_encoding?
=> false

@headius
Copy link
Contributor Author

headius commented Oct 18, 2024

Perhaps that fix never made it into main then. ☹️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants